A running labbook for ideas in the Ahnold project

Bioregions

Looking at @Hamilton2010, bioregions clearly play a role. You could include them simply as covariates. But, that doesn’t account for the potential interactions between MPAs and regions. Might need to either run them as seperate models (worth it to see), with interaction effects with MPA to see if there are differences among bioregions in MPA effects

Interspecies Interactions

Can you look at the spatial-covariance matrix of these species?

Or is there enough data to sugihara the damn thing, and ask whether some species seem to cause decreases in others?

Landings data

Can be found here Can pretty easily figure out a way to scrape all this

Fixed effects, random effects, and errors in variables, and clustering

It’s important that you get this straightened out.

The “errors in variables” relates to the idea that your covariate data are now random draws from some distribution. So, for example, you observe length data, which you fit to a model that predicts length data and a standard deviation, and you pass those predicted length data to some other part of the model. See box 6.2.2 in bayesian primer. This is just basically saying as another random variable in the liklihood, instead of taking it as given (where you would just use the length data)

Basically, what I’m trying to figure out here is how to properly deal with the the clustering of the data in the time series. So, for any categorical variable that is at a higher level than the data, it seems reasonable to just give it a prior (e.g. all the year coefficients share a common prior), all the species effects share a prior by group, etc, where all of those share a common prior.

What do you do about things like linf though? Repeated a whole ton over time, but you can’t just give it a hierarchical prior…. Looking at Gelman and Hill

OOOOOOOHHHHHH, maybe this is the right approach. Give it species fixed effects, and then make the species fixed effects a regression of species effects coefficients on life history traits. So, a multi-level modeling approach, like Gelman and Hill page 241.

From a hierarchichal perspective then, the idea is now to think of these as a bunch of nested regressions for each level of the data, with the right random effects linking each layer together.

So, you have year terms, and those year terms are a function of yearly things like temperature and el niño, species terms, etc. The problem that you’ve been having, and this probably explains some of the convergance issues, is that you have massive colinearity when you try and include fixed effects for things like species, and then the covariates at the species level? So, this allows you to sort of get the best of both worlds, where you now get the “intercepts” by the right thing, but those intercepts take into account the data at the intercept level that you have that can influence the outcome. So it looks though from Gelman and Hill page 281 that to do this, you don’t have to include those terms in the mixed effects part of the regression itself, but only put them in, but with the appropriate priors. So, that’s where the common prior comes in? You’ll just need to mess with this one with code once you can code it in STAN

You should test this by getting the thing written in STAN, and comparing two versions, one where you do it the usual way (estimate the mean of the prior), one where you include covariates in the model, the other where you fit them seperately

AHA@@!!@!#@!@#! THAT’S WHY THE MEAN OF THE PRIOR IS 0!!!! think about it. Say you want to include fixed effects for each site. Under the model as it get’s written all over the place here, they are distributed with mean(u,sigma). Now, when you include the fixed effects mean in that context: those are deviations from a mean 0! So, you’ve already accounted for the change from the mean effect by including the intercept. If you then say include site specific temperature in there, you’d just chuck it in as a variable, which would do the exact same thing as saying N(site + temp,sigma), or site + temp + n(0,sigma). All those terms are getting soaked up in the intercept. Suppose that it should be N(.2,sigma). So, no matter what, ever year term is going to be .2 +- some term. That is the same then as just adding .2 to the intercept, and then setting the mean of the prior to 0. So, they real key here then is simply specifying the group-level standard deviation. Wow.

Everything is a random effect in bayes, the key is is it grouped

Model Structure

Let’s go through the thing from scratch and think through how you want this thing to look, from a model diagramming perspective.

Hurdle component

Data

So may be you need to be thinking about this kind of like a CPUE standardization, where on one hand you’re fitting a poisson distribution to the transect data, and using that to estimate densities, based on deterministic equations and a sigma (or maybe treating that part as data to make things a bit easier), and then fitting the regression to those densities. The challenging part is how you rationalize the two distributions for density; that implied by the regressin and that implied by the error from the logit model

AHA, so the key here is thinking about the marginal posterior of the regression coefficients, not just the regression coefficients. The sigma of the regression model us saying taking those data as observations from a ranfom event, how well does the model explain that. But, that’s just under that one scenario. So, if you were doing this in a simple way where the posterior is analytical, then you get the betas out. But, the full marginal now would take into account the range of worlds that the densities could live in, so you’d sample from the joint posterior to get the betas of interest!

Process

See box 6.2.2 in a Bayesian Primer for ideas on this.

The way I’m leaning towards is something like this: There is a true biomass that is a random draw from a distribution with mean observed biomass and a variance defined by…. something (observer, group, a function of kelp, vis, etc.). The inverse of that, that the observed are a random draw from some true distribution with mean true and sigma above seems hard to do in this context: you’d have to fit a massive number, or possibly an unidentifiable number, of means, unless you want to say that that there’s some clustering (e.g. mean by species by site by year, which I think are the raw aggregation anyways).

Ah wait, you’re making this wayyyy to complicated. The thing you’re thinking about here are errors in predictor variables, not dependent variables like the density, which is what you’re trying to explain. Under a traditional Bayesian framework, the dependent variabels are already considered random draws (until they are obsered), while the rest are considered data. But, the stuff you were thinking of there are for independent variables, things like temperature. So, you could say that temperature is imperfectyl measured, and comes from a distribution with some true mean and sigma.

So, you don’t need to mess with the “errors” in density, since it’s already considered a random draw, from a distribution with mean expeced density from the model itself, and a sigma estiamted by the model.

Now, where things could then get interesting is using a model to estimate the sigmas. i.e. sigma is a linear function of say a constant, plus kelp cover + vis + research group… etc.

Now, that’s all assuming that the biomass densities are in fact your data. There’s another way you could do this, somewhat similar to @Karnauskas2011. The idea here. you have twoish data sources. One are the actual lengths. The other are the observed densities. The goal is still to fit a regression to the densities. The issue now though is that you’re going to fit the regression to densities estimated from the raw length data. So, you’d take the length data. You’d then use that length data to build up a model of densities that you then fit the regression on. The problem is, you’d need a model relating something back to the length data, which you aren’t really doing here. i.e what would the lengths be conditional on? Alternatively, you treat the lengths as data, and the densities as random variables. Now, you have the observed densities, that are draws from a distribution with mean of predicted density, where predicted density is a function of the lengths? That is a bit of double dipping though. Now one option could be to train the regression on “density” data generated from the lengths. So now, you say the observed densities are drawn from a distribution predicted by the regression, where the regression is trained on the expected densities predicted from the length data, estimating sigmas all the way. This seems maybe the most robust option, but also a royal pain in the ass. And do you really gain a whole lot from that?

It seems like you have something like three different model structures all models can share a common hierarchical nature to the data, the question really is how do you want to model errors, especially conditional on what you really care about are doing a good job on the estimators, not on the out-of-sample prediction

  1. The standard model, with potential for clustering of errors in densities as a function of appropriate levels (e.g. species groups, etc. ).That’s really what equation 6.2.51 in BP is doing: making a more complex model of the error structure associated with the observations.

  2. A model for standard deviations in the densities, where sigma is a linear function of an intercept and apprpriate covariates (like visibility, kelp, etc.) Could be an interesting way to test things that need to be accounted for in the model. That’s really what equation 6.2.51 in BP is doing: making a more complex model of the error structure associated with the observations.

  3. A data-generation model, where the length frequency are taken as perfectly observed, or are used to generate draws from a poisson, a la @Karnauskas2011, converted to densities, and then the regression is fit to those densities, and then used to predict the observed densities. Seems like a worse and worse idea, especially if you’re not as interested in prediction as you are in estimation.

Parameters

Let’s think about the covariates first. The question is basically how do you want to cluster the errors. #### Sites

You have site specific parameters, like location, region, etc. These stay constant at each site over time. Those should almost certainly be modeled as random effects, where maybe each site gets an intercept, drawn from the region that it’s in, and each region then get’s an uninformative prior, since the region’s aren’t really random effects, but constant enities in this world.

Now one question I have then is how does that fit in a regression framework, i.e. relative to “controlling” for the region effect in the regression itself. I don’t think that it matters, you just need to stop thinking abotu things so linearly. You’re only goal is to model the variation in density as a function of things, and this goes in there, the effect of region just isn’t a linear additive term, but rather affects the site specific terms

Species

You also have a bunch of species-specific effects. Now, on one level, these make more sense to me as fixed effects (the idea is that they aren’t really random draws from a population, but rather “true” values). But, where I get confused then is how to deal with these correctly in a panel framework, given that they are repeated. If I remember my Gelman, the way to deal with this in the year terms for example is by giving them a hierarchichal nature.

So, maybe one way to think of all the life history things is that each life history value is a draw from a “random” model that genrates life history traits. So, you estimate the coefficients for linf, where linf is a random variable drawn a global distribution.

AHA, i think your confusion is in differentiating the data from the coefficients. I think the “random effects” vs “fixed effects” question relates to how you want to model the data itself, as either the data, or draws from some distribution that you then estimate as well.

“In Bayesian hierarchical modeling, random effects are used to describe variation that occurs beyond variation that arises from sampling alone”

Hobbs, N. Thompson; Hooten, Mevin B.. Bayesian Models: A Statistical Primer for Ecologists (Page 114). Princeton University Press. Kindle Edition.

Priors

Jan 3 2017

New thoughts on model structure

OK, after doing some reading I think I’m geting a better handle on this. The key question: how do you properly account for the sampling nature of the MLPA data

The problem is you’ve been thinking about this rather confusedly between the unclear distinction between hierarchical and state space (see your evernore entry on state space modeling).

So let’s start from the ground up.

  1. You observe samples of length data, binned by size
  2. These sample arrise from a underlying true length structure, let’s say a Poisson, meaning that the mean and SD are the same. So, you fit a model that replicates this process WRONG!!!
    • The samples are the data! they are the mean values of the true poisson distribution. So, if that’s the case, none of this matters ha, since there’s nothing to estimate.

If that’s the case, maybe you go back to an explicit observation model, where the error in the length structures/biomass are a function of covariates.

library(tidyverse)
huh <- rpois(1000,100)
data_frame(huh = huh) %>% 
  ggplot(aes(huh)) + 
  geom_bar()

rmultinom(10, size = 12, prob = c(0.1,0.2,0.8,.1))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    0    1    0    0    1    2    2    0     2
[2,]    2    0    3    4    3    3    0    0    2     1
[3,]    7    9    8    6    9    8    7   10    9     8
[4,]    2    3    0    2    0    0    3    0    1     1

1/4/17

OK, that was a good road to head down, but not a great place to start. It seems like there’s not really a huge upside to be had from the crazy length-up modeling process. At the end of the day, unless you introduce a bias term or something like that, you’re just going to get the mean of the prediction back. So, you could certainly introduce a bit more uncertainty, but not really a substantive change in the outcome.

library(tidyverse)
dat <- read_csv('../data/UCSB_FISH raw thru 2013.csv')
Parsed with column specification:
cols(
  .default = col_character(),
  year = col_integer(),
  month = col_integer(),
  day = col_integer(),
  transect = col_integer(),
  count = col_integer(),
  fish_tl = col_double(),
  min_tl = col_integer(),
  max_tl = col_integer(),
  depth = col_double(),
  vis = col_double(),
  temp = col_double(),
  pctcnpy = col_integer()
)
See spec(...) for full column specifications.
dat <-  dat %>% 
  mutate(size_range = ifelse(is.na(max_tl - min_tl), 0, max_tl - min_tl ))
dat %>% 
  ggplot(aes(size_range)) +
  geom_histogram() + 
  scale_y_log10()

Almost all the observations have no real range to them. Looking at a histogram of the range of the sizes reported (max - min), where 0 corresponds to no range reported, almost all are within a few centimeters, which really isn’t going to play our in the biomass all that much. only NaN% of samples have size ranges above 10 centimeters. So, to be really really precise, you could simulate length frequences for those, but really hardly seems worth the effort.

The bigger question then is if you want to try and model bias in the observations; e.g. counts of certain species really should be higher/lower under different conditions. Something like a “standardized” survey index. i.e. maybe counts should be higher, taking into account poor vis or the like. Basically, it will be important if it introduces bias, or there is substantially more error around particular kinds of species. But beyond that, not like your analysis is fundamentally flawed or anything, and it’s unclear how much benefit you get from controlling for those factors at the ground level, vs. controlling for them in the regression itself (i.e. all else being equal visibility drives down/up counts).

There’s still the biomass issue (converting lengths into biomass). There’s obviously error in there, but probably not bias, so again, the issue would really be in just adding more noise. It’s hard to know how you’d deal with that since there’s not really a signal in the data to tell you anything about that error. It would be nice to work with Jen to do the translation from lengths to densities right there in the model, but again that seems like a second order thing: I don’t think the existing densities are wrong or anything. The biggest thing though is that doing the densities raw would allow you to actually tally them up at the aggregation level you want, instead of taking means/medians across space and time, which could disguise some trends. Let’s work on that actually.

So, I think you’re back go the important thing being doing a good job of dealing with the hierarchical nature of the data themselves. So let’s spend the next few days focusing on that.

Data Exploration

Let’s go back to square one and think a little about the nature of the data that you’re dealing with, and explore the relationships of the data with density and length.

length_data <- read_csv('../data/UCSB_FISH raw thru 2013.csv') %>% 
    magrittr::set_colnames(.,tolower(colnames(.)))
Parsed with column specification:
cols(
  .default = col_character(),
  year = col_integer(),
  month = col_integer(),
  day = col_integer(),
  transect = col_integer(),
  count = col_integer(),
  fish_tl = col_double(),
  min_tl = col_integer(),
  max_tl = col_integer(),
  depth = col_double(),
  vis = col_double(),
  temp = col_double(),
  pctcnpy = col_integer()
)
See spec(...) for full column specifications.
conditions_data <- length_data %>% 
  group_by(site,side,year) %>% 
  summarise(mean_temp = mean(temp, na.rm = T),
            mean_kelp = mean(pctcnpy, na.rm = T),
            mean_vis = mean(vis, na.rm = T)) 
  
life_history_data <- read_csv('../data/VRG Fish Life History in MPA_04_08_11_12 11-Mar-2014.csv') %>%
  rename(classcode = pisco_classcode) %>% 
  magrittr::set_colnames(.,tolower(colnames(.)))
Parsed with column specification:
cols(
  .default = col_character(),
  include = col_integer(),
  Juv.cut.cm = col_double(),
  WL_a = col_double(),
  WL_b = col_double(),
  LC.a._for_WL = col_double(),
  LC.b._for_WL = col_double(),
  VBGF.Linf = col_double(),
  VBGF.k = col_double(),
  VBGF.t0 = col_double(),
  VBGF.Linf.F = col_double(),
  VBGF.k.F = col_double(),
  VBGF.t0.F = col_double(),
  VBGF.Linf.M = col_double(),
  VBGF.k.M = col_double(),
  VBGF.t0.M = col_double(),
  LC.a._for_VBGF = col_double(),
  LC.b._for_VBGF = col_double(),
  F_a = col_double(),
  F_b = col_double(),
  Size_Mature_cm = col_double()
  # ... with 7 more columns
)
See spec(...) for full column specifications.
site_data <- read_csv('../data/Final_Site_Table_UCSB.csv') %>%
    magrittr::set_colnames(.,tolower(colnames(.)))
Parsed with column specification:
cols(
  SITE = col_character(),
  SIDE = col_character(),
  SITE_SIDE = col_character(),
  MPAGROUP = col_character(),
  MPA_STATUS = col_character(),
  RESERVE = col_character(),
  USE = col_character(),
  REGION = col_character(),
  YEAR_MPA = col_integer(),
  MPAAREANM2 = col_double(),
  MPASHORENM = col_double(),
  lat_wgs84 = col_double(),
  lon_wgs84 = col_double()
)
length_data <- length_data %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  left_join(site_data, by = c('site','side'))

Converting length to biomass/density

One thing I’d like to be able to do is move directly from the lengths to the density at any aggregation that I’m interested in by actually tallying the length structure. As it stands right now, you need to take means/medians to aggregate densities over time, which I don’t really like

Let’s take a look at the actual density data that Jen provides

density_data <- read_csv('../data/ci_reserve_data_final3 txt.csv') %>%
  magrittr::set_colnames(.,tolower(colnames(.))) %>% 
  gather('concat.name','value', grep('_',colnames(.)),convert = T) %>%
  mutate(data.type = gsub('\\_.*', '', concat.name),
         classcode = gsub('.*\\_','',concat.name)) %>%
  mutate(value = as.numeric(value)) %>%
  spread(data.type,value) %>%
  rename(site_side = site.side)
Parsed with column specification:
cols(
  .default = col_double(),
  site = col_character(),
  side = col_character(),
  year = col_integer(),
  fishtrans = col_integer(),
  fish_BATH = col_integer(),
  fish_AINE = col_integer(),
  fish_CFAL = col_integer(),
  fish_CPUG = col_integer(),
  fish_NUNI = col_integer(),
  fish_ACAL = col_integer(),
  fish_HAZU = col_integer(),
  fish_SDAL = col_integer(),
  fish_SUMB = col_integer(),
  fish_PHOL = col_integer(),
  fish_RPRO = col_integer(),
  fish_XCAL = col_integer(),
  fish_RALL = col_integer(),
  fish_NCEP = col_integer(),
  fish_BOTH = col_integer(),
  fish_AVUL = col_integer()
  # ... with 59 more columns
)
See spec(...) for full column specifications.
density_example <- density_data %>% 
  filter(is.na(biomass) == F & biomass >0) %>% 
  sample_n(1)
density_example

Let’s take a look and see if you can replicate this density example.

length_to_weight <-
  function(mean_length,
  min_length,
  max_length,
  count,
  weight_a,
  weight_b,
  length_units = 'cm',
  weight_units = 'g',
  length_for_weight_units = 'mm',
  length_type_for_weight,
  tl_sl_a,
  tl_sl_b,
  tl_sl_type,
  tl_sl_formula) {
  #
  # generate_lengths <- function(count,mean_length, min_length, max_length){
  
  if (is.na(count) | count == 0){
    
    outweight <-  0
  }  else {
    
  if (is.na(min_length) |
  is.na(max_length)) {
  #generate distribution of lengths
  
  lengths <-  rep(mean_length, count)
  
  } else{
  # lengths <-  pmax(min_length,pmin(max_length,rpois(count, lambda = mean_length)))
  lengths <- runif(count, min = min_length, max = max_length)
  }
  if (length_type_for_weight == 'SL') {
  if (tl_sl_type  == 'TYPICAL') {
  weight_lengths <-  lengths * tl_sl_a + tl_sl_b
  } else{
  weight_lengths <- (lengths - tl_sl_b) / tl_sl_a
  
  }
  
  } else {
  weight_lengths <-  lengths
  }
  
  if (length_units == 'cm' & length_for_weight_units == 'mm') {
  weight_lengths <- weight_lengths * 10
  }
  
  weight <-  weight_a * weight_lengths ^ weight_b
  
  if (weight_units == 'kg') {
  weight <- weight * 1000
  }
  outweight = sum(weight)
  }
    return(outweight)
  } #close function
  
length_example <- length_data %>% 
  filter(classcode == toupper(density_example$classcode)
, site == density_example$site, 
         side == density_example$side, year == density_example$year) 
length_example <-   length_data %>% 
  filter(is.na(commonname) == F) %>% 
  mutate(biomass_g = pmap_dbl(list(mean_length = fish_tl,
                                      min_length = min_tl,
                                      max_length = max_tl,
                                      count = count,
                                      weight_a = wl_a,
                                      weight_b = wl_b,
                                      length_type_for_weight = wl_input_length,
                                      length_for_weight_units = wl_l_units,
                                      tl_sl_a = lc.a._for_wl,
                                      tl_sl_b = lc.b._for_wl,
                                      tl_sl_type = lc_type_for_wl,
                                      tl_sl_formula = ll_equation_for_wl), length_to_weight))
NAs producedNAs producedNAs producedNAs produced
length_example %>% 
  select(biomass_g)
biomass_data <- length_example %>% 
  group_by(classcode, site, side, year, transect) %>% 
  summarise(total_biomass_g = sum(biomass_g)) 

First, need to add back in zeros. You need a function that goes through trip by trip, and adds in zeros for all species seen at that site at some point but not on that trip.

species_sightings <- length_data %>% 
  group_by(site) %>% 
  summarise(species_seen = list(unique(classcode)))
biomass_data <- biomass_data %>% 
  ungroup() %>% 
  select(site,side,year, transect) %>% 
  unique() %>%  {
  pmap(
    list(
      this_site = .$site,
      this_side = .$side,
      this_year = .$year,
      this_transect = .$transect
    ),
    add_missing_fish,
    observations = biomass_data,
    species_sightings = species_sightings
  )
} %>% 
  bind_rows()
man_density_data <- biomass_data %>% 
  group_by(classcode, site, side,year) %>% 
  summarise(mean_biomass_g = mean(total_biomass_g, na.rm = T)) %>% 
  mutate(
            biomass_g_per_m2 = mean_biomass_g / (30*4),
            biomass_g_per_hectare = biomass_g_per_m2 * 10000,
            biomass_ton_per_hectare = biomass_g_per_hectare * 1e-6)

OK! You’ve got hand calculated densities now, let’s compare them to jens

density_comp_plot <- density_data %>% 
  select(classcode, site, side, year, biomass) %>% 
  mutate(classcode = toupper(classcode)) %>% 
  left_join(man_density_data, by = c('classcode','site','side','year')) %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  ggplot(aes(biomass,biomass_ton_per_hectare, color = commonname)) + 
             geom_abline(aes(intercept = 0, slope = 1), linetype= 2) +
           geom_point() + 
  scale_color_discrete(guide = F) + 
  labs( y = 'Biomass (tons per hectare) - calculated from lengths and weights', x = 'Biomass (tons per hectare) - from ci_reserve_data_final3 txt.csv', 
  caption = 'Resolution is at species-year-site-side')
density_comp_plot
ggsave(density_comp_plot, file = 'density comparison plot.pdf',dev = cairo_pdf)
Saving 7.29 x 4.51 in image

# plotly::ggplotly(density_comp_plot)

Not Bad. It’s a good starting point, and makes me confident that I’m understanding things right. You’ll need to talk with Jen to figure out why this isn’t 1:1. This also let’s you compare outcomes under the two sources. ## Exploring relationships

Let’s dig into things a bit here and just look at trends in the (to start with) two versions of the database

man_density_data <- man_density_data %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  left_join(site_data, by = c('site','side'))
man_density_data %>% 
  filter(is.na(targeted) == F) %>% 
  group_by(region, year, targeted) %>% 
  summarise(median_biomass = mean(biomass_ton_per_hectare, na.rm = T)) %>% 
  ggplot(aes(year,median_biomass, color = targeted)) + 
  geom_line() + 
  facet_wrap(~region)

Now same thing, but with Jen’s data

density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
  filter(is.na(targeted) == F) %>% 
  group_by(region, year, targeted) %>% 
  summarise(mean_biomass = mean(biomass, na.rm = T)) %>% 
  ggplot(aes(year,mean_biomass, color = targeted)) + 
  geom_line() + 
  facet_wrap(~region)

And just a really quick regression exploration

reg_data <- density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
      left_join(conditions_data, by = c('site','side','year')) %>% 
  filter(is.na(targeted) == F) %>% 
  filter(biomass > 0) %>% 
  mutate(log_biomass = log(biomass),
         mlpa_in_effect = as.numeric(year > 2003),
         fished = as.numeric(targeted == 'Targeted'),
         did = as.numeric(fished * year * mlpa_in_effect))
reg_fmla <- as.formula('log_biomass ~  as.factor(year) + fished +  as.factor(did) + trophicgroup + vbgf.linf +
                       mean_temp')
basic_reg <- lm(reg_fmla, data = reg_data)
# stan_reg <- rstanarm::stan_glm(reg_fmla, data = reg_data)
summary(basic_reg)

Call:
lm(formula = reg_fmla, data = reg_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.6300 -1.0031  0.1861  1.2185  7.4898 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           -4.7443927  0.3248828 -14.603  < 2e-16 ***
as.factor(year)2000                   -0.3839197  0.2902987  -1.322 0.186035    
as.factor(year)2001                   -0.7474464  0.3002206  -2.490 0.012804 *  
as.factor(year)2002                   -0.9318372  0.2784394  -3.347 0.000821 ***
as.factor(year)2003                   -1.2505112  0.2634377  -4.747 2.10e-06 ***
as.factor(year)2004                   -1.5210538  0.2644595  -5.752 9.13e-09 ***
as.factor(year)2005                   -1.3494622  0.2568724  -5.253 1.53e-07 ***
as.factor(year)2006                   -1.4227991  0.2570623  -5.535 3.20e-08 ***
as.factor(year)2007                   -1.4540782  0.2571524  -5.655 1.61e-08 ***
as.factor(year)2008                   -2.1078080  0.2568756  -8.206 2.61e-16 ***
as.factor(year)2009                   -1.2413963  0.2568777  -4.833 1.37e-06 ***
as.factor(year)2010                   -0.9394307  0.2612564  -3.596 0.000325 ***
as.factor(year)2011                   -1.1199622  0.2612836  -4.286 1.83e-05 ***
as.factor(year)2012                   -1.6021303  0.2621615  -6.111 1.03e-09 ***
as.factor(year)2013                   -1.1653702  0.2664487  -4.374 1.24e-05 ***
fished                                 0.9818545  0.1440916   6.814 1.01e-11 ***
as.factor(did)2004                    -0.3440541  0.2259494  -1.523 0.127867    
as.factor(did)2005                    -0.2747237  0.1862225  -1.475 0.140182    
as.factor(did)2006                    -0.2430356  0.1879294  -1.293 0.195965    
as.factor(did)2007                     0.0281801  0.1854401   0.152 0.879219    
as.factor(did)2008                     0.3224524  0.1869612   1.725 0.084615 .  
as.factor(did)2009                     0.0030925  0.1854786   0.017 0.986698    
as.factor(did)2010                    -0.2608361  0.2043295  -1.277 0.201795    
as.factor(did)2011                    -0.0863844  0.2068926  -0.418 0.676299    
as.factor(did)2012                    -0.0125608  0.2085814  -0.060 0.951982    
as.factor(did)2013                    -0.0996501  0.2245465  -0.444 0.657210    
trophicgroupbenthic micro-invertivore -0.8805295  0.0580442 -15.170  < 2e-16 ***
trophicgroupherbivore                  0.8533675  0.0767618  11.117  < 2e-16 ***
trophicgrouppiscivore                 -0.7031733  0.0745551  -9.432  < 2e-16 ***
trophicgroupplanktivore                0.0802367  0.0677855   1.184 0.236569    
vbgf.linf                             -0.0004153  0.0001775  -2.340 0.019321 *  
mean_temp                              0.1349210  0.0120940  11.156  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.748 on 9105 degrees of freedom
  (4753 observations deleted due to missingness)
Multiple R-squared:  0.1602,    Adjusted R-squared:  0.1574 
F-statistic: 56.05 on 31 and 9105 DF,  p-value: < 2.2e-16
broom::tidy(basic_reg)
  basic_reg %>%
    broom::tidy() %>%
    filter(str_detect(term,'did')) %>%
    mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    ggplot() +
    geom_hline(aes(yintercept = 0)) + 
    geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) +
    geom_smooth(aes(x = year, y = estimate), method = 'lm')

  
    # stan_reg %>%
    # broom::tidy() %>%
    # filter(str_detect(term,'did')) %>%
    # mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    # ggplot() +
    # geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) + 
    #       geom_smooth(aes(x = year, y = estimate), method = 'lm')
reg_data <- man_density_data %>% 
  ungroup() %>% 
  filter(is.na(targeted) == F) %>% 
  filter(biomass_ton_per_hectare > 0) %>% 
  mutate(log_biomass = log(biomass_ton_per_hectare),
         mlpa_in_effect = as.numeric(year > 2003),
         fished = as.numeric(targeted == 'Targeted'),
         did = as.numeric(fished * year * mlpa_in_effect))
reg_fmla <- as.formula('log_biomass ~  as.factor(year) + fished +  as.factor(did) + trophicgroup + vbgf.linf')
basic_reg <- lm(reg_fmla, data = reg_data)
# stan_reg <- rstanarm::stan_glm(reg_fmla, data = reg_data)
# summary(basic_reg)
# 
# broom::tidy(basic_reg)
  basic_reg %>%
    broom::tidy() %>%
    filter(str_detect(term,'did')) %>%
    mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    ggplot() +
    geom_hline(aes(yintercept = 0)) + 
    geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) +
    geom_smooth(aes(x = year, y = estimate), method = 'lm')

  
    # stan_reg %>%
    # broom::tidy() %>%
    # filter(str_detect(term,'did')) %>%
    # mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    # ggplot() +
    # geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) + 
    #       geom_smooth(aes(x = year, y = estimate), method = 'lm')

Well that’s not encouraging: this suggests that the density calculation really does matter here. You’ll go with Jen’s data for now, but big red flag of something that needs fixing here.

Are unfished and fished valid controls?

There are a few ways you could think about testing for this.

  1. Does the MLPA come out as a causal factor on the unfished species?

  2. Do fished speices/unfished species cause each other (do lags of fished species predict unfished and vice versa)

reg_data <- density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
  left_join(conditions_data, by = c('site','side','year')) %>% 
  filter(is.na(targeted) == F) %>% 
  filter(biomass > 0) %>% 
  mutate(log_biomass = log(biomass),
         mlpa_in_effect = as.numeric(year > 2003),
         unfished = as.numeric(targeted != 'Targeted'),
         did = as.numeric(unfished * year * mlpa_in_effect))
reg_fmla <- as.formula('log_biomass ~  as.factor(year) + mean_temp+unfished +  as.factor(did) + trophicgroup + vbgf.linf')
basic_reg <- lm(reg_fmla, data = reg_data)
# stan_reg <- rstanarm::stan_glm(reg_fmla, data = reg_data)
summary(basic_reg)

Call:
lm(formula = reg_fmla, data = reg_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.6300 -1.0031  0.1861  1.2185  7.4898 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           -3.7625382  0.3408704 -11.038  < 2e-16 ***
as.factor(year)2000                   -0.3839197  0.2902987  -1.322 0.186035    
as.factor(year)2001                   -0.7474464  0.3002206  -2.490 0.012804 *  
as.factor(year)2002                   -0.9318372  0.2784394  -3.347 0.000821 ***
as.factor(year)2003                   -1.2505112  0.2634377  -4.747 2.10e-06 ***
as.factor(year)2004                   -1.8651079  0.3117720  -5.982 2.28e-09 ***
as.factor(year)2005                   -1.6241858  0.2917863  -5.566 2.68e-08 ***
as.factor(year)2006                   -1.6658346  0.2919373  -5.706 1.19e-08 ***
as.factor(year)2007                   -1.4258980  0.2913804  -4.894 1.01e-06 ***
as.factor(year)2008                   -1.7853556  0.2914174  -6.126 9.36e-10 ***
as.factor(year)2009                   -1.2383039  0.2905469  -4.262 2.05e-05 ***
as.factor(year)2010                   -1.2002668  0.3007640  -3.991 6.64e-05 ***
as.factor(year)2011                   -1.2063466  0.3012316  -4.005 6.26e-05 ***
as.factor(year)2012                   -1.6146911  0.3024039  -5.340 9.54e-08 ***
as.factor(year)2013                   -1.2650203  0.3093716  -4.089 4.37e-05 ***
mean_temp                              0.1349210  0.0120940  11.156  < 2e-16 ***
unfished                              -0.9818545  0.1440916  -6.814 1.01e-11 ***
as.factor(did)2004                     0.3440541  0.2259494   1.523 0.127867    
as.factor(did)2005                     0.2747237  0.1862225   1.475 0.140182    
as.factor(did)2006                     0.2430356  0.1879294   1.293 0.195965    
as.factor(did)2007                    -0.0281801  0.1854401  -0.152 0.879219    
as.factor(did)2008                    -0.3224524  0.1869612  -1.725 0.084615 .  
as.factor(did)2009                    -0.0030925  0.1854786  -0.017 0.986698    
as.factor(did)2010                     0.2608361  0.2043295   1.277 0.201795    
as.factor(did)2011                     0.0863844  0.2068926   0.418 0.676299    
as.factor(did)2012                     0.0125608  0.2085814   0.060 0.951982    
as.factor(did)2013                     0.0996501  0.2245465   0.444 0.657210    
trophicgroupbenthic micro-invertivore -0.8805295  0.0580442 -15.170  < 2e-16 ***
trophicgroupherbivore                  0.8533675  0.0767618  11.117  < 2e-16 ***
trophicgrouppiscivore                 -0.7031733  0.0745551  -9.432  < 2e-16 ***
trophicgroupplanktivore                0.0802367  0.0677855   1.184 0.236569    
vbgf.linf                             -0.0004153  0.0001775  -2.340 0.019321 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.748 on 9105 degrees of freedom
  (4753 observations deleted due to missingness)
Multiple R-squared:  0.1602,    Adjusted R-squared:  0.1574 
F-statistic: 56.05 on 31 and 9105 DF,  p-value: < 2.2e-16
broom::tidy(basic_reg)
  basic_reg %>%
    broom::tidy() %>%
    filter(str_detect(term,'did')) %>%
    mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    ggplot() +
    geom_hline(aes(yintercept = 0)) + 
    geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) +
    geom_smooth(aes(x = year, y = estimate), method = 'lm')

I don’t think that that’s a valid comparison, since you at least assume that the fished aren’t a control. You could also just do a changepoint analysis?

Alternatively, let’s think about this mechanistically. If you believe that the “targeted” classification is real, then by definition the MPA has no direct effect on the fished species. The real question then is whether there are trophic interactions that cause a problem. So, why not run a regression of unfished densities as a function of targeted abundance, controlling for other crap?

Interesting, some evidence of effect but it’s messy, and depends a lot on model specification.

El Niño

a = read_html('https://www.esrl.noaa.gov/psd/enso/mei/table.html') %>% 
  html_node('body') %>% 
  html_text()
enso <- read_lines('https://www.esrl.noaa.gov/psd/enso/mei/table.html')
enso <- enso[str_detect(enso,'\t|(YEAR)')] %>% 
  write('enso.txt')
enso <-  read.csv('enso.txt', sep = '\t', header = F)
table_names <- enso$V1[1] %>% 
  as.character() %>% 
  str_split(boundary('word'), simplify = T) %>% 
  as.character() %>% 
  tolower()
enso <- enso %>% 
  slice(-1) %>% 
  as_data_frame()
colnames(enso) <-  table_names
enso <- enso %>% 
  gather('bimonth','enso',-year)
enso %>% 
  mutate(year = year %>% as.character() %>% as.numeric()) %>% 
  group_by(year) %>% 
  summarise(mean_enso = mean(enso)) %>% 
  ggplot(aes(year, mean_enso)) + 
  geom_point()

 enso <- read_table("http://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data",
            na = c("-99.99", "99.99",'-99'), skip = 1, n_max = lubridate::year(Sys.time()) - 1870 + 1,
            col_names = c("year", 1:12)) %>%
 gather(month, enso, -year) %>%
 mutate(month = as.double(month))
cols(
  year = col_integer(),
  `1` = col_double(),
  `2` = col_double(),
  `3` = col_double(),
  `4` = col_double(),
  `5` = col_double(),
  `6` = col_double(),
  `7` = col_double(),
  `8` = col_double(),
  `9` = col_double(),
  `10` = col_double(),
  `11` = col_double(),
  `12` = col_double()
)

Get PDO

 pdo <- read_table("https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/pdo.long.data",
            na = c("-99.99", "99.99",'-99'), skip = 1, n_max = lubridate::year(Sys.time()) - 1900,
            col_names = c("year", 1:12)) %>%
 gather(month, pdo, -year) %>%
 mutate(month = as.double(month),
        date = lubridate::ymd(paste(year,month,'01', sep = '-')))
cols(
  year = col_integer(),
  `1` = col_double(),
  `2` = col_double(),
  `3` = col_double(),
  `4` = col_double(),
  `5` = col_double(),
  `6` = col_double(),
  `7` = col_double(),
  `8` = col_double(),
  `9` = col_double(),
  `10` = col_double(),
  `11` = col_double(),
  `12` = col_double()
)
pdo %>% 
  filter(year >= 2000) %>% 
  ggplot(aes(date,pdo)) +
  geom_hline(aes(yintercept = 0), linetype = 2) +
  geom_point()

Causal defense ideas

How can you defend the causal nature of the results?

Include a “lead” on the DiD term: This is turned on in the years leading up to the policy, and says that the policy is going to happen. SHould be insignificant if the model is right. Look back at mostly harmless and Olivier’s note

How well does out-of-sample prediction help you with causality? At it’s face it gives evidence that your model is doing a good job of describing the data. But does it imply help with causality? Suppose you had a model that said predict if it’s raining outside based on umbrellas. It’s out of sample prediction would be great, but that doesn’t mean that opening umbrellas is going to cause rain.

Regression Exploration

Goal of this section is to start really digging into the regressions.

Data Exploration

Let’s stick with Jen’s data for now, but cognizant that you need to look into reproducing and sensitivities to transformation assumptions

Pulling in Jen’s density data, merging some useful temperature, site, life history, enso, and PDO data

density_data <- read_csv('../data/ci_reserve_data_final3 txt.csv') %>%
  magrittr::set_colnames(.,tolower(colnames(.))) %>% 
  gather('concat.name','value', grep('_',colnames(.)),convert = T) %>%
  mutate(data.type = gsub('\\_.*', '', concat.name),
         classcode = gsub('.*\\_','',concat.name)) %>%
  mutate(value = as.numeric(value)) %>%
  spread(data.type,value) %>%
  rename(site_side = site.side)
Parsed with column specification:
cols(
  .default = col_double(),
  site = col_character(),
  side = col_character(),
  year = col_integer(),
  fishtrans = col_integer(),
  fish_BATH = col_integer(),
  fish_AINE = col_integer(),
  fish_CFAL = col_integer(),
  fish_CPUG = col_integer(),
  fish_NUNI = col_integer(),
  fish_ACAL = col_integer(),
  fish_HAZU = col_integer(),
  fish_SDAL = col_integer(),
  fish_SUMB = col_integer(),
  fish_PHOL = col_integer(),
  fish_RPRO = col_integer(),
  fish_XCAL = col_integer(),
  fish_RALL = col_integer(),
  fish_NCEP = col_integer(),
  fish_BOTH = col_integer(),
  fish_AVUL = col_integer()
  # ... with 59 more columns
)
See spec(...) for full column specifications.
length_data <- read_csv('../data/UCSB_FISH raw thru 2013.csv') %>% 
    magrittr::set_colnames(.,tolower(colnames(.)))
Parsed with column specification:
cols(
  .default = col_character(),
  year = col_integer(),
  month = col_integer(),
  day = col_integer(),
  transect = col_integer(),
  count = col_integer(),
  fish_tl = col_double(),
  min_tl = col_integer(),
  max_tl = col_integer(),
  depth = col_double(),
  vis = col_double(),
  temp = col_double(),
  pctcnpy = col_integer()
)
See spec(...) for full column specifications.
temperature_data <- length_data %>% 
  group_by(site,side,year) %>% 
  summarise(mean_temp = mean(temp, na.rm = T))
life_history_data <- read_csv('../data/VRG Fish Life History in MPA_04_08_11_12 11-Mar-2014.csv') %>%
  rename(classcode = pisco_classcode) %>%
  mutate(classcode = tolower(classcode)) %>% 
  magrittr::set_colnames(.,tolower(colnames(.)))
Parsed with column specification:
cols(
  .default = col_character(),
  include = col_integer(),
  Juv.cut.cm = col_double(),
  WL_a = col_double(),
  WL_b = col_double(),
  LC.a._for_WL = col_double(),
  LC.b._for_WL = col_double(),
  VBGF.Linf = col_double(),
  VBGF.k = col_double(),
  VBGF.t0 = col_double(),
  VBGF.Linf.F = col_double(),
  VBGF.k.F = col_double(),
  VBGF.t0.F = col_double(),
  VBGF.Linf.M = col_double(),
  VBGF.k.M = col_double(),
  VBGF.t0.M = col_double(),
  LC.a._for_VBGF = col_double(),
  LC.b._for_VBGF = col_double(),
  F_a = col_double(),
  F_b = col_double(),
  Size_Mature_cm = col_double()
  # ... with 7 more columns
)
See spec(...) for full column specifications.
life_names <- c('classcode',colnames(life_history_data)[!colnames(life_history_data) %in% colnames(density_data)])
life_history_data <- life_history_data[ , life_names]
site_data <- read_csv('../data/Final_Site_Table_UCSB.csv') %>%
    magrittr::set_colnames(.,tolower(colnames(.)))
Parsed with column specification:
cols(
  SITE = col_character(),
  SIDE = col_character(),
  SITE_SIDE = col_character(),
  MPAGROUP = col_character(),
  MPA_STATUS = col_character(),
  RESERVE = col_character(),
  USE = col_character(),
  REGION = col_character(),
  YEAR_MPA = col_integer(),
  MPAAREANM2 = col_double(),
  MPASHORENM = col_double(),
  lat_wgs84 = col_double(),
  lon_wgs84 = col_double()
)
site_names <- c('site','side',colnames(site_data)[!colnames(site_data) %in% colnames(density_data)])
site_data <- site_data[,site_names]
enso <- read_csv('../data/enso.csv') %>% 
  group_by(year) %>% 
  summarise(mean_enso = mean(enso, na.rm = T))
Parsed with column specification:
cols(
  year = col_integer(),
  bimonth = col_character(),
  enso = col_double()
)
pdo <- read_csv('../data/pdo.csv') %>% 
group_by(year) %>% 
  summarise(mean_pdo = mean(pdo, na.rm = T))
Parsed with column specification:
cols(
  year = col_integer(),
  month = col_double(),
  pdo = col_double(),
  date = col_date(format = "")
)
comp_data <- density_data %>% 
  left_join(temperature_data, by = c('site','side','year')) %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  left_join(site_data, by = c('site','side')) %>% 
  left_join(enso, by = 'year') %>% 
  left_join(pdo, by = 'year')
  

Data Composition

Let’s look at the distribution of sample size over time; where are your data coming from?

comp_data %>% 
  group_by(year) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs)) + 
  geom_point()

NA
comp_data %>% 
  group_by(year,targeted) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs, fill = targeted)) + 
  geom_bar(stat = 'identity')

Huh, so a bunch of the observations are “unknown” on targeting status. Let’s look into this. Beyond that though, the targeted and non-targeted are fairly well balanced.

huh <- comp_data %>% 
  filter(is.na(targeted))
sort(unique(huh$classcode))
  [1] "aarg"           "agafim"         "agua"           "aine"          
  [5] "alamar"         "antsol"         "antspp"         "antxan"        
  [9] "aplcal"         "astarm"         "astser"         "astspp"        
 [13] "avul"           "balnub"         "bfreyoy"        "both"          
 [17] "canspp"         "carnivore"      "cencor"         "cfal"          
 [21] "clup"           "coscos"         "coveraglstr"    "coveranem"     
 [25] "coverbarnac"    "coverbarroc"    "coverbarsan"    "coverbrown"    
 [29] "coverbryo"      "coverclam"      "coverclavspp"   "covercomtun"   
 [33] "covercorcal"    "covercrucor"    "covercucspp"    "covercupcor"   
 [37] "covercysosm"    "coverdeadhold"  "coverdesspp"    "coverdiacal"   
 [41] "coverdicspp"    "coverdiocha"    "coverdodfew"    "coveregrmen"   
 [45] "coverencred"    "covererecor"    "coverfleshy"    "covergorgad"   
 [49] "covergreen"     "coverhetpac"    "coverhydroid"   "coverlamhold"  
 [53] "covermacpyr"    "covermud"       "covermussel"    "coverpacrub"   
 [57] "coverphrcal"    "coverphyspp"    "coversaltri"    "coversarfil"   
 [61] "coversarmut"    "coversarspp"    "coverscallop"   "coverscum"     
 [65] "coverserpet"    "covershell"     "coversoltun"    "coversponge"   
 [69] "coverstycal"    "coverstypor"    "covertubemat"   "covertubeworm" 
 [73] "coverturf"      "cpug"           "cpunyoy"        "cragig"        
 [77] "crysit"         "cryste"         "cucspp"         "cypspa"        
 [81] "cysosmad"       "derimb"         "diccal"         "egrmen"        
 [85] "eisarbad"       "eugrub"         "gorgad"         "halass"        
 [89] "halcor"         "halcra"         "halful"         "halkam"        
 [93] "halruf"         "halspp"         "halwal"         "henlev"        
 [97] "herbivore"      "hlag"           "hsemyoy"        "kelkel"        
[101] "laminaria"      "lepchi"         "lincol"         "loxgra"        
[105] "loxscy"         "luifol"         "lytanaad"       "macpyrad"      
[109] "macstipes"      "medaeq"         "megcre"         "meggib"        
[113] "megspp"         "megund"         "metspp"         "mimfol"        
[117] "mmol"           "murcal"         "murfru"         "ncep"          
[121] "nerlue"         "nontargeted"    "nuni"           "ocalyoy"       
[125] "ortkoe"         "pacfim"         "panint"         "parcal"        
[129] "parpar"         "parspp"         "patmin"         "patr"          
[133] "pclayoy"        "pgla"           "pisbre"         "piscivore"     
[137] "pisgig"         "pisoch"         "planktivore"    "pleu"          
[141] "pmac"           "pnot"           "ptecalad"       "pugpro"        
[145] "pugric"         "pugspp"         "pychel"         "rath"          
[149] "reliefflatrel"  "reliefhirel"    "reliefmodrel"   "reliefsltrel"  
[153] "rjor"           "rpro"           "ryoy"           "saca"          
[157] "smysyoy"        "sneb"           "spauyoy"        "spinyoy"       
[161] "spulyoy"        "ssem"           "strfraad"       "strpurad"      
[165] "styfor"         "stymon"         "substratebedrk" "substratebould"
[169] "substratecob"   "substratesand"  "sumb"           "targeted"      
[173] "tealof"         "tetaur"         "unid"           "urtcol"        
[177] "urticina"       "zexa"          

Aha, these appear to be a a bunch of misc. critters and classification for understory cover. Aslo “young of the year”. Probably best to filter these guys out.

comp_data %>% 
  filter(is.na(commonname) == F) %>% 
   group_by(year) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs)) + 
  geom_point()

NA
comp_data %>% 
  filter(is.na(commonname) == F) %>% 
  group_by(year,region) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs, fill = region)) + 
  geom_bar(stat = 'identity')

Huh, worth noting that a few of the islands only come in after 2003. What is this doing to your effect that you’re basically brining in a bunch of new islands with different effects right after implementation of MPAs in 2003?

comp_data %>% 
  filter(is.na(commonname) == F) %>% 
  group_by(year,broadtrophic) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs, fill = broadtrophic)) + 
  geom_bar(stat = 'identity')

Effect Exploration

Let’s look at the change in density as a functino of a variety of variables

candidates <- c('biomass','year','site','side','classcode',
                'region','broadtrophic','reserve','campus','mpaareanm2','mean_temp','guild','trophicgroup','targeted',
                'mean_enso','mean_pdo','vbgf.k','vbgf.linf','max_length_fishbase')
reg_data <- comp_data[,candidates] %>% 
  filter(is.na(biomass) == 0, is.na(targeted) == F) %>% 
  mutate(log_density = log(biomass),
         factor_year = as.factor(year),
         targeted = as.numeric(targeted == 'Targeted'),
         post_mlpa = as.numeric(year >= 2003),
         reserve = as.numeric(reserve == 'IN'),
         did = as.factor(targeted * post_mlpa * year)
         )
reg_data %>% 
  group_by(year,targeted) %>% 
  summarise(mean_density = mean(biomass, na.rm = T)) %>% 
  ggplot(aes(year,mean_density, color = factor(targeted))) + 
  geom_line()

reg_data %>% 
  group_by(year,targeted,region) %>% 
  summarise(mean_density = mean(log(biomass + 1e-3), na.rm = T)) %>% 
  ggplot(aes(year,mean_density, color = factor(targeted))) + 
  geom_line() + 
  facet_wrap(~region) + 
  ylab('log mean density')

One troubling pattern, both ANA and SCI show that steep drop in densities from 2000-2003, which might really be skewing the effect of the MPAs (that early negative effect), though on the plus side it’s both targetted and non-targeted

Let’s look at some other factors here

reg_data %>% 
  group_by(broadtrophic) %>% 
  summarise(mean_density = mean(biomass, na.rm = T)) %>% 
  ggplot(aes(broadtrophic,mean_density, fill = broadtrophic)) + 
  geom_bar(stat = 'identity')

No surprises, higher density of herbivores

reg_data %>% 
  group_by(region) %>% 
  ggplot(aes(region,biomass, fill = region)) + 
  geom_boxplot()

reg_data %>% 
  select(biomass,year,mean_temp, mean_pdo, mean_enso) %>% 
  gather(metric,value, contains('mean')) %>% 
  group_by(year,metric) %>% 
  summarise(mean_biomass = log(mean(biomass, na.rm = T)), mean_value = mean(value, na.rm = T)) %>% 
  ggplot(aes(mean_value,mean_biomass, fill = metric)) + 
  geom_abline(aes(intercept = mean(mean_biomass, na.rm = T), slope = 0)) + 
  geom_point(shape = 21) + 
  facet_wrap(~metric, scales = 'free_x')

Interesting, definitely some noise in here, but seems to a a trend towards the “mean” of the environmental covariates. Good evidence for throwing a quadratic in here.

environmental_cor = reg_data %>% 
  filter(is.na(mean_temp) == F) %>% 
  select(mean_temp, mean_enso, mean_pdo, year) %>%
  cor() 
corrplot::corrplot(environmental_cor)

Quite a bit of correlation between some of the environmental variables, but you don’t really care about estimating those precisely so not a huge deal.

But, the year thing could be a bit messy, since you have such poor contrast before and after MPA, so might need to look at mean PDO as a problem variable.

DiD Structure

Now, what should the actual difference in difference look like?

The key thing you need is treatment on the treated, so that’s being a fished species post 2003.

If you include species specific fixed effects, then you don’t have to include the general effect of being a fished species, since those will all come out in the wash (i.e. the fished effect is internalized in the species specific fixed effects, or alternatively, you say that the species level effect is a function of covariates, including being fished)

Similarly with post-mpa. If you don’t include year fixed effects, then you need it. But, if you include year fixed effects, then the “post mpa” effect should be soaked up, and in fact perfectly colinear with, the year fixed effects.

So, now, the real pain in the ass is the year terms. Which I think you figured out!

The question then is how do you control for other things.

You could just include the DiD term

Bare-bones regression

Let’s start with a bare bones regression, using rstanarm, one hierarchichal, one not. You’ll then compare your results to one where you code the likelihood yourself

reg_vars <- c('log_density','factor_year', 'year','targeted', 'did', 'region' ,
'mean_enso','mean_pdo', 'mean_temp','classcode','site','side','post_mlpa','region',
'broadtrophic')

has_all <- function(x) any(is.na(x)) == F

pos_reg_data <- reg_data %>% 
  select(-mean_temp) %>% 
  left_join(conditions_data %>% select(site,side,year, contains('mean')), by = c('site','side','year')) %>% 
  filter(biomass >0, year >=2000) %>% 
  select_(.dots = as.list(reg_vars)) %>% 
  mutate(has_all_vars = apply(.,1,has_all)) %>% 
  filter(has_all_vars == T) %>% 
  map2_df(colnames(.), center_scale, omit_names = c('log_density','year','mean_enso','mean_pdo')) %>%
  mutate(did_dummy = 1 * targeted,
         did_year = paste('did',year, sep = '_')) %>% 
  spread(did_year,did_dummy, fill = 0) %>% 
  mutate(temp2 = mean_temp^2,
         pdo2 = mean_pdo^2,
         enso2 = mean_enso^2,
         site_side = paste(site, side, sep = '_'))

# pos_reg_data$did_2003[pos_reg_data$did_2003 == 1] <-  0.5

pos_reg_data %>% 
  group_by(factor_year,targeted) %>% 
  summarise(mb = mean(log_density, na.rm = T)) %>% 
  ggplot(aes(factor_year, mb, color = factor(targeted))) + 
  geom_point()


did_years <- paste('did',2000:2013, sep = '_')

did_year <- did_years[did_years!='did_2000']


simple_reg <-
as.formula(
paste0(
'log_density ~',
paste(did_year, collapse = '+'),
' + (1|year) + (1 + mean_temp + temp2 |classcode) + mean_enso +  mean_pdo + (1 | site_side) + (1 | site) + (1 | region) + targeted + post_mlpa'
)
)

# + vbgf.linf +
# vbgf.k +
# mean_enso + enso2 + mean_pdo + pdo2 + mean_temp  + temp2')

freq_flat_reg <- lme4::lmer(simple_reg, data = pos_reg_data)

flat_reg <- stan_glmer(simple_reg, data = pos_reg_data,chains = 1)


freq_did_plot <-  freq_flat_reg %>% 
  tidy() %>% 
  mutate(lower = estimate - 1.96 * std.error,
         upper = estimate + 1.96 * std.error) %>% 
  filter(str_detect(term,'did')) %>% 
  mutate(year = str_replace(term, 'did_','') %>% as.numeric()) %>% 
  ggplot() +
geom_hline(aes(yintercept = 0)) + 
  geom_vline(aes(xintercept = 2003), color = 'red', linetype = 2) +
  geom_pointrange(aes(year,estimate, ymax = upper, ymin = lower)) + 
  ylab('Estimated MLPA Effect') + 
  ggrepel::geom_text_repel(data = data_frame(x = 2003, y = 1), aes(x,y, label = 'MLPA Enacted'),nudge_x = 2) + 
  xlab('Year')
         
freq_did_plot

did_plot <- flat_reg %>% 
  as.data.frame() 
  
  did_plot <- did_plot[,str_detect(colnames(did_plot), 'did_')] %>% 
  gather(did,effect) %>% 
  mutate(year = str_replace(did,'did_','') %>% as.numeric()) %>% 
  group_by(year) %>% 
  summarise(mean_effect = mean(effect),
            top = sort(effect)[round(.975 * length(effect))],
            bottom = sort(effect)[round(.025 * length(effect))]) %>% 
  ggplot() + 
  geom_hline(aes(yintercept = 0)) + 
  geom_vline(aes(xintercept = 2003), color = 'red', linetype = 2) +
  geom_pointrange(aes(year,mean_effect, ymax = top, ymin = bottom)) + 
  ylab('Estimated MLPA Effect') + 
  ggrepel::geom_text_repel(data = data_frame(x = 2003, y = 1), aes(x,y, label = 'MLPA Enacted'),nudge_x = 2) + 
  xlab('Year')

did_plot
ggsave('new_mlpa_did.pdf',did_plot)

mtcars %>% 
  map_df(center_scale)

Now let’s try and replicate with a custom STAN function

did_plot <- flat_reg %>% 
  as.data.frame() %>% 
  select(contains('classcode')) %>% 
  gather(classcode,effect) %>% 
  mutate(classcode = str_replace(classcode,'classcode',''))

stan_reg_data <- pos_reg_data %>% 
  mutate(year_marker = 1) %>% 
  mutate(year = paste('year',year, sep = '_')) %>% 
  spread(year,year_marker, fill = 0) %>% 
  mutate(classcode = paste('classcode',classcode, sep = '_'),
         class_marker = 1) %>% 
  spread(classcode,class_marker, fill = 0) %>% 
    mutate(site = paste('site',site, sep = '_'),
         site_marker = 1) %>% 
  spread(site,site_marker, fill = 0) %>% 
  select(log_density, contains('year_'), contains('did_'),
         contains('classcode_'), contains('site_')) %>% 
  mutate(constant = 1)

levels_to_drop <- c(min(pos_reg_data$year),sort(unique(pos_reg_data$classcode))[1],
                    sort(unique(pos_reg_data$site))[1])

stan_reg_data <- stan_reg_data %>% 
  select(-year_2000, -did_2003, -site_ANACAPA_ADMIRALS, -classcode_acor)
                    

y <- as.numeric(stan_reg_data$log_density)

x <- stan_reg_data %>% 
  select(-log_density) %>% 
  as.matrix()


stan_fit <- stan(
  file = '../scripts/ahnold_reg.stan',
  data = list(
    num_pars = dim(x)[2],
    num_obs = length(y),
    y = y,
    x = x
  ),
  chains = 4, 
  warmup = 1000,
  iter = 2000,
  cores = 4, 
  refresh = 100
)

did_terms <- which(str_detect(colnames(x),'did'))

did_plot <- stan_fit %>% 
  as.data.frame() %>% 
  select_(.dots = as.list(did_terms)) %>% 
  gather(did,effect) %>% 
  # mutate(year = str_replace(did,'did_','') %>% as.numeric()) %>% 
  group_by(did) %>% 
  summarise(mean_effect = mean(effect),
            top = sort(effect)[round(.975 * length(effect))],
            bottom = sort(effect)[round(.025 * length(effect))]) %>% 
  ggplot() + 
  geom_hline(aes(yintercept = 0)) + 
  # geom_vline(aes(xintercept = 2003), color = 'red', linetype = 2) +
  geom_pointrange(aes(1:length(mean_effect),mean_effect, ymax = top, ymin = bottom)) + 
  ylab('Estimated MLPA Effect') + 
  # ggrepel::geom_text_repel(data = data_frame(x = 2003, y = 1), aes(x,y, label = 'MLPA Enacted'),nudge_x = 2) + 
  xlab('Year')

OK! Things work. Moving over to run_ahnold for formal construction of this process

Multi-level (hierarchichal) notes

OK, so I think I’m finally starting to get the hang of this. The confusion, from a regression point of view, is how mechanical do you have to be about the “multilevel part”.

Look at Gelman 12.15 and the code there.

Your confusion has been, say you’ve got observations at the transect level, and you want to include species level covariates. In a fixed effects world this wouldn’t work: you can’t estiamtes species level fixed effects, along with things like vbk. that don’t vary within a species. You can either include species fixed effects, or component things that define species.

The way Gelman talks about this in multilevel modeling, is that you can instead model this in a hierarchical manner, where the coefficients of the species fixed effects are a function of things like vbk. This makes sense, but the question is the mechanics of this. My impression was that I would have to do this all manually, i.e. take the betas of the fixed effects, then write up a regression of those betas as a function of vbk etc. Makes sense but a total pain in the ass.

Looking at Gelman page 266 though, it becomes clearer. It seems like mechanically you can do this just by converting the fixed effects to clustered random effects, and then including vbk as just another coefficient.

e.g. lmer(y ~ x + vbk + (1 | species))

Propensity Scores

Add in broad bioregion effect

That could just be recovery inside and not spillover. Do you have to have spillover

They don’t count the

Look at Steve’s BACI response to Ray’s

What happens to the regulaions outside in addition to the reserve. A simple model that looks at what would you expect

Good data for the mainland

Look into literature for priors

I’m going to show that you can disentangle recruitment from lenghts

2017-03-15

Expressing MLPA effect

Might be a good idea to show the significance statistically, but the effect through simulation. Simulate draws from the joint posterior with and without MLPA, show mean densities. Easier to understand, allows you to combine the net effect of hurdle component

2017-03-16

Check in with Jen

  • Check in with the idea of more local indicies of ENSO/PDO

  • No big ENSO event till 2013

  • Might need to explore removing painted greenling since they are bottom dwelling cryptic-ish. Tend to be counted more frequently when there wasn’t much else.

  • Drop santa barbara island entirely (one off sample, 4 years)

  • Let’s think about parsing this by species. Fishing is very different across different species in here, what species are really driving the analysis? Are different species disproportionately driving the results

  • We have before and after data from the overflights, so could at least use that as a prior on the scale of redistribution. SAMSAP might be able to pick up the potential “blue paradox” side of things, as

  • What if this is all just a recruitment signal

  • We could look at the SMURF data to get an idea of the recruitment trends

  • Something weird happened in 2008-2010 across all the CI data

  • PISCO assumes that all species have the same probability of being seen anywhere in all the islands. Might be worth looking into what happens if you zero fill by data-base wide

2017-21-03

Things are looking pretty strong. Basic Model diagnostics look solid for the most part, though the LMER approach is a bit skewed to the right. But no alarm bells. The biggest issue is really low effective samples sizes for several species, as well as collinearity between a few of the model parameters (load up the STAN run and launch shinystan to dig into this a bit).

A few things you can do to try and deal with that. Re-run the model with only let’s say the top 75% percentile by positive occurance species, to make sure that you’re not getting too thrown by a bunch of really rare species. Can also try rerunning without painted greenling, since Jen seems to think those buggers might be a problem. I’m a little hesitant to do too much judgement based pruning of the data. Key things is does it really cause the results to break down.

Jen also raised the issue of the zero filling. You did your zero filling by site. It seems that when Jen was going from raw lengths to densities, she was zero filling by region. So, a garibalidi should be missing from every transect in San Miguel. This seems a litle extreme to me, but you can play with this once you get the ability to move from lenghts to densities. I really want to hone in on any subjective decisions that might be driving the results, and this seems like a potentially big candidate .

Update: removing painted greenling has no real effect on results. Phew.

Running now with only the top 50th percentile of species incorporated in the analysis, seeing what that does.

One thought on the random vs. fixed effects thing: should species really be a random effect? Maybe you should only have varying slopes but not intercepts by species, so have fixed effects by species category, and then temperature and region effects by species.

Wooooo species effects don’t matter either

You should go through and spell out why you think some things are random and some are fixed effects, drawing from the gelman framework, where the nature of the hierarchichal prior is really the defining characteristic

Sketching out Hypothesis Runs

What are the hypothetical states of the MPA world that you want to run over? One option is to specify a handful of boutique model runs: model 1, 2, 3 etc, each designed to evaluate a very specific set of circumstances. The other is to specify a set of key levels, and then monte carlo over combinations of those levers.

Regardless, the key things seem to be

  1. Strength of density dependence
  2. Nature of density dependence
  3. initial b and f
  4. Larval movement
  5. Adult movement
  6. Age at maturity
  7. Fleet reaction
  8. Species composition
  9. Scale of MPA relative to movement

The first 6 are relatively straightforward. Seems possibly easier to monte carlo and then pull out case study scenarios that you are interested in, since speed really isn’t the issue here.

You could also use that to easily create ensembles of species.

So, for a given “run” you’d specify how many species you want to generate, and then project them and aggregate. So, you could specify one species if you want ot make it clear, or an assemblage of many different species.

The risk with this approach is that you might just get a gigantic mess of results. But, you can easily pare this approach down to a concrete set if you want. I think this gets away from “cherry picking” scenarios that fit your story. You can present a giant array in trelliscope of outcomes, and then choose some to present in the paper.

  1. Strength of density dependence
    • this is just steepness
  2. Nature of density dependence
    • You can use the babcock definitions that are already in there
  3. f
    • One option is just to say pick a random F
    • The other option is some kind of effort dynamics model, from open access to one way trip etc.
    • Seems excessive, the key thing is it overfished or not, and is it getting worse or not
    • Pick a random f, run it out
    • Stop the thing at some random point…. problem there is you can’t be in “recovery”
  4. Larval movement
    • harder. Will take some thought to do this one right really. Key thing is do you want to deal with larvae or with recruits really. See how you do it in GASP
  5. Adult movement
    • Easy enough, what proportion of adults move from each cell
  6. Age at maturity
    • simple enough
  7. Fleet reaction
    • Dilute
    • Concentrate
  8. Species composition
  9. Scale of MPA relative to movement
    • Make the MPAs about the pattern of CI

For now, let’s take the life history straight out of the data that are actually used. So, grab the species that make it through the filter and project those. So, each run will take the life history data and run with it with a random assortment of traits.

Let’s focus on keeping it simple stupid. Take the species, project forward with a random F and a random stop point, a steepness drawn from the family distribution for that species, a random density denepence form… let’s spell it out

  1. Draw a species from the database
  2. Fill in missing non-variale life history characteristics (e.g. linf)
  3. Randomly assign characteristics of interest (density dependence form, movement, etc.)
  4. Create completed fish object
  5. Create a fleet object
  6. Fill in the fleet object with things like F, or fleet model dynamics etc.
  7. Pass a completed fish and fleet object to sim_fishery
  8. Store results
  9. Repeat runs a whole bunch of times.
  10. Aggregate as desired

4/13/17

As long as the errors in your dependent variable are mean 0, then there’s no problem. If they are not, or if the errors are a function of your dependent variable, then you do have problem

Look in to stratification from that book of Kyles

Check in with COdy

Make mortality/growth rates a function of biomass

Check on lit on this

Make plot of length on x and year on y, so you get a bunch of stacked shifts in cohorts

Maybe add in a ricker function, though might need to double check on the steepness values. You might be moving things over the to right hand side of

Examine OST catch data

ost_region_catch %>% 
  mutate(pounds = str_replace(pounds,',','') %>% as.numeric(),
         revenue = str_replace_all(revenue,"\\$|,",''),
         price = str_replace(`average price`,'\\$','')) %>% 
  group_by(year,fishery) %>% 
  summarise(catch_lbs = sum(pounds, na.rm = T)) %>% 
  ggplot(aes(year,catch_lbs, color = fishery)) + 
  geom_vline(aes(xintercept = 2003), linetype = 2, color = 'red') +
  geom_line()
NAs introduced by coercionWarning messages:
1: Unknown or uninitialised column: 'V1'. 
2: Unknown or uninitialised column: 'V1'. 

Interesting, stablish throughout the SC region, though that’s a pretty damn big area

ost_port_catch %>% 
  filter(port %in% c("Port Hueneme/Oxnard", "Santa Barbara","Ventura")) %>% 
  mutate(pounds = str_replace(pounds,',','') %>% as.numeric(),
         revenue = str_replace_all(revenue,"\\$|,",''),
         price = str_replace(`average price`,'\\$','')) %>% 
  group_by(year,fishery) %>% 
  summarise(catch_lbs = sum(pounds, na.rm = T)) %>% 
  ggplot(aes(year,catch_lbs, color = fishery)) + 
  geom_vline(aes(xintercept = 2003), linetype = 2, color = 'red') +
  geom_line(show.legend = F) + 
  facet_wrap(~fishery, scales = 'free_y') + 
  theme(axis.text.y =  element_blank(),
        axis.text.x = element_blank())
NAs introduced by coercionWarning messages:
1: Unknown or uninitialised column: 'V1'. 
2: Unknown or uninitialised column: 'V1'. 
3: Unknown or uninitialised column: 'V1'. 
4: Unknown or uninitialised column: 'V1'. 

4/14/17

I’m honestly close to out of ideas. These catch dat are pretty worring to be honest. If you buy these data as being representative of the fished species out in the channel islands, then this suggests a decreasin gcatch trend in nearshore finfish catches, not stable or increasing. That really makes it hard to explain the MPA mediated “decrease” in abundance.

Possible explanations that remain.

  1. Some form of density dependent mortality (or ricker like dynamics)
  • Can model this
  1. Catches at the islands themselves have been stable or gone up
  • Can try and get some more regional data to check on this
  1. Ocam’s razor: There was just a dramatic enough shift in sampling regimes when the MPAs went in place that pre-and-post are just not comparable in any way shape of form

  2. Let’s write up the current results and state of the world this weekend/next week and send that to committee for comments

Scraping CDFW Data

map_df(a, process_cdfw)
Called from: .f(.x[[i]], ...)
debug at #21: x <- map_df(x, ~numfoo)
x <- map_df(x, ~numfoo)
Error: cannot convert object to a data frame
x
View(x)
a
[[1]]
      [,1]                                                     [,2]          [,3]         
 [1,] "alifornia Waters"                                       ""            ""           
 [2,] "Fishes"                                                 ""            ""           
 [3,] "Anchovy, northern............................. "        " 1,135,732 " " 1,425,102 "
 [4,] "Barracuda, California......................... "        " 8 "         " 77 "       
 [5,] "Bass, giant sea................................. "      " 85 "        " 63 "       
 [6,] "Blacksmith......................................... "   " 0 "         " 0 "        
 [7,] "Bonito, Pacific................................... "    " 13,618 "    " 38 "       
 [8,] "Butterfish (Pacific pompano)......... "                 " 3 "         " 0 "        
 [9,] "Cabezon............................................ "   " 728 "       " 1,098 "    
[10,] "Croaker, white................................... "     " 986 "       " 946 "      
[11,] "Eel, California moray......................... "        " 0 "         " 0 "        
[12,] "Fish, unspecified............................... "      " 122 "       " 5 "        
[13,] "Flounder, arrowtooth......................... "         " 0 "         " 0 "        
[14,] "Flounder, starry................................. "     " 5 "         " 0 "        
[15,] "Flounder, unspecified........................ "         " 0 "         " 0 "        
[16,] "Flyingfish........................................... " " 0 "         " 0 "        
[17,] "Greenling, kelp.................................. "     " 0 "         " 0 "        
[18,] "Guitarfish, shovelnose....................... "         " 1,391 "     " 314 "      
[19,] "Halfmoon........................................... "   " 0 "         " 0 "        
[20,] "Halibut, California.............................. "     " 17,401 "    " 15,195 "   
[21,] "Halibut, unspecified........................... "       " 36 "        " 19 "       
[22,] "Jacksmelt.......................................... "   " 0 "         " 0 "        
[23,] "Lingcod............................................. "  " 0 "         " 0 "        
[24,] "Lizardfish, California.......................... "      " 36 "        " 0 "        
[25,] "Louvar............................................... " " 0 "         " 0 "        
[26,] "Mackerel, Pacific............................... "      " 1 "         " 0 "        
[27,] "Mackerel, jack................................... "     " 0 "         " 1 "        
[28,] "Opah................................................. " " 97 "        " 0 "        
[29,] "Opaleye............................................. "  " 0 "         " 0 "        
[30,] "Queenfish.......................................... "   " 0 "         " 0 "        
[31,] "Ray, Pacific electric........................... "      " 0 "         " 96 "       
[32,] "Ray, bat............................................. " " 0 "         " 0 "        
[33,] "Rockfish, Pacific ocean perch........... "              " 0 "         " 0 "        
[34,] "Rockfish, black-and-yellow................ "            " 0 "         " 0 "        
[35,] "Rockfish, black.................................. "     " 0 "         " 0 "        
[36,] "Rockfish, blackgill.............................. "     " 0 "         " 38 "       
[37,] "Rockfish, blue................................... "     " 0 "         " 0 "        
[38,] "Rockfish, bocaccio............................ "        " 51 "        " 0 "        
      [,4]        [,5]        [,6]        [,7]          [,8]        [,9]        [,10]      
 [1,] ""          ""          ""          ""            ""          ""          ""         
 [2,] ""          ""          ""          ""            ""          ""          ""         
 [3,] " 207,906 " " 303,040 " " 714,516 " " 1,041,818 " " 789,275 " " 735,579 " " 401,055 "
 [4,] " 33 "      " 1,736 "   " 2,521 "   " 3,047 "     " 818 "     " 177 "     " 211 "    
 [5,] " 129 "     " 71 "      " 351 "     " 537 "       " 651 "     " 154 "     " 9 "      
 [6,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 0 "       " 0 "      
 [7,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 0 "       " 289 "    
 [8,] " 53 "      " 27 "      " 55 "      " 0 "         " 0 "       " 0 "       " 0 "      
 [9,] " 2,616 "   " 2,835 "   " 4,955 "   " 6,001 "     " 3,236 "   " 5,076 "   " 6,675 "  
[10,] " 464 "     " 253 "     " 1,606 "   " 443 "       " 117 "     " 7 "       " 859 "    
[11,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 0 "       " 0 "      
[12,] " 5 "       " 14 "      " 11 "      " 7 "         " 0 "       " 0 "       " 0 "      
[13,] " 0 "       " 5 "       " 0 "       " 0 "         " 0 "       " 0 "       " 0 "      
[14,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 0 "       " 0 "      
[15,] " 0 "       " 6 "       " 0 "       " 0 "         " 9 "       " 0 "       " 0 "      
[16,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 5 "       " 0 "      
[17,] " 0 "       " 0 "       " 4 "       " 0 "         " 0 "       " 1 "       " 11 "     
[18,] " 1,841 "   " 1,726 "   " 445 "     " 135 "       " 285 "     " 92 "      " 719 "    
[19,] " 32 "      " 231 "     " 44 "      " 187 "       " 24 "      " 55 "      " 314 "    
[20,] " 22,806 "  " 20,469 "  " 12,895 "  " 31,321 "    " 35,175 "  " 15,626 "  " 8,565 "  
[21,] " 195 "     " 50 "      " 319 "     " 231 "       " 123 "     " 0 "       " 0 "      
[22,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 0 "       " 1 "      
[23,] " 84 "      " 0 "       " 488 "     " 237 "       " 203 "     " 17 "      " 18 "     
[24,] " 50 "      " 106 "     " 234 "     " 0 "         " 0 "       " 0 "       " 0 "      
[25,] " 0 "       " 0 "       " 0 "       " 22 "        " 0 "       " 54 "      " 28 "     
[26,] " 10 "      " 7 "       " 0 "       " 283,330 "   " 29 "      " 196,139 " " 27,528 " 
[27,] " 0 "       " 0 "       " 0 "       " 0 "         " 16 "      " 20,007 "  " 22 "     
[28,] " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 40 "      " 50 "     
[29,] " 3 "       " 0 "       " 0 "       " 0 "         " 5 "       " 0 "       " 0 "      
[30,] " 0 "       " 0 "       " 2 "       " 0 "         " 0 "       " 0 "       " 0 "      
[31,] " 0 "       " 0 "       " 0 "       " 0 "         " 10 "      " 170 "     " 60 "     
[32,] " 0 "       " 0 "       " 0 "       " 0 "         " 45 "      " 25 "      " 18 "     
[33,] " 0 "       " 0 "       " 0 "       " 63 "        " 0 "       " 0 "       " 0 "      
[34,] " 0 "       " 4 "       " 0 "       " 0 "         " 0 "       " 10 "      " 17 "     
[35,] " 0 "       " 0 "       " 489 "     " 0 "         " 0 "       " 0 "       " 0 "      
[36,] " 2,225 "   " 213 "     " 1,554 "   " 995 "       " 777 "     " 29 "      " 2,482 "  
[37,] " 19 "      " 8 "       " 270 "     " 5 "         " 0 "       " 0 "       " 0 "      
[38,] " 26 "      " 13 "      " 58 "      " 65 "        " 49 "      " 30 "      " 27 "     
      [,11]       [,12]       [,13]      [,14]       
 [1,] ""          ""          ""         ""          
 [2,] ""          ""          ""         ""          
 [3,] " 346,156 " " 647,023 " " 75,357 " " 7,822,559"
 [4,] " 199 "     " 253 "     " 80 "     " 9,159"    
 [5,] " 0 "       " 10 "      " 93 "     " 2,154"    
 [6,] " 0 "       " 0 "       " 17 "     " 17"       
 [7,] " 0 "       " 9 "       " 66,625 " " 80,579"   
 [8,] " 80 "      " 284 "     " 50 "     " 552"      
 [9,] " 4,114 "   " 4,195 "   " 5,377 "  " 46,907"   
[10,] " 7,877 "   " 2,565 "   " 9,801 "  " 25,923"   
[11,] " 0 "       " 13 "      " 90 "     " 103"      
[12,] " 0 "       " 0 "       " 0 "      " 164"      
[13,] " 0 "       " 0 "       " 0 "      " 5"        
[14,] " 0 "       " 0 "       " 0 "      " 5"        
[15,] " 0 "       " 0 "       " 0 "      " 15"       
[16,] " 0 "       " 0 "       " 0 "      " 5"        
[17,] " 4 "       " 0 "       " 2 "      " 23"       
[18,] " 2,430 "   " 619 "     " 308 "    " 10,305"   
[19,] " 11 "      " 9 "       " 39 "     " 945"      
[20,] " 9,349 "   " 10,831 "  " 14,724 " " 214,357"  
[21,] " 0 "       " 0 "       " 0 "      " 974"      
[22,] " 0 "       " 0 "       " 0 "      " 1"        
[23,] " 159 "     " 12 "      " 0 "      " 1,217"    
[24,] " 0 "       " 30 "      " 6 "      " 462"      
[25,] " 179 "     " 15 "      " 0 "      " 297"      
[26,] " 1,187 "   " 0 "       " 0 "      " 508,231"  
[27,] " 4 "       " 0 "       " 0 "      " 20,049"   
[28,] " 2,914 "   " 344 "     " 76 "     " 3,521"    
[29,] " 0 "       " 1 "       " 2 "      " 11"       
[30,] " 0 "       " 0 "       " 0 "      " 2"        
[31,] " 80 "      " 0 "       " 0 "      " 416"      
[32,] " 0 "       " 0 "       " 0 "      " 88"       
[33,] " 0 "       " 0 "       " 0 "      " 63"       
[34,] " 3 "       " 0 "       " 0 "      " 33"       
[35,] " 0 "       " 0 "       " 0 "      " 489"      
[36,] " 2,178 "   " 6,766 "   " 598 "    " 17,855"   
[37,] " 15 "      " 0 "       " 11 "     " 327"      
[38,] " 71 "      " 89 "      " 172 "    " 651"      

[[2]]
      [,1]                                                    [,2]        [,3]        [,4]      
 [1,] "alifornia Waters"                                      ""          ""          ""        
 [2,] "Fishes"                                                ""          ""          ""        
 [3,] "Rockfish, brown................................. "     " 0 "       " 0 "       " 17 "    
 [4,] "Rockfish, canary................................ "     " 0 "       " 0 "       " 7 "     
 [5,] "Rockfish, chilipepper......................... "       " 0 "       " 22 "      " 6 "     
 [6,] "Rockfish, copper............................... "      " 0 "       " 0 "       " 278 "   
 [7,] "Rockfish, cowcod.............................. "       " 0 "       " 0 "       " 0 "     
 [8,] "Rockfish, gopher............................... "      " 0 "       " 28 "      " 271 "   
 [9,] "Rockfish, grass................................. "     " 0 "       " 37 "      " 1,143 " 
[10,] "Rockfish, group bolina....................... "        " 0 "       " 0 "       " 2 "     
[11,] "Rockfish, group canary/vermili.......... "             " 0 "       " 0 "       " 0 "     
[12,] "Rockfish, group gopher..................... "          " 0 "       " 0 "       " 46 "    
[13,] "Rockfish, group nearshore................ "            " 0 "       " 0 "       " 155 "   
[14,] "Rockfish, group red........................... "       " 0 "       " 0 "       " 775 "   
[15,] "Rockfish, group rosefish.................... "         " 0 "       " 0 "       " 0 "     
[16,] "Rockfish, group shelf......................... "       " 0 "       " 0 "       " 87 "    
[17,] "Rockfish, group slope........................ "        " 0 "       " 0 "       " 0 "     
[18,] "Rockfish, group small........................ "        " 0 "       " 0 "       " 0 "     
[19,] "Rockfish, kelp.................................... "   " 0 "       " 0 "       " 63 "    
[20,] "Rockfish, rosethorn........................... "       " 0 "       " 0 "       " 0 "     
[21,] "Rockfish, rosy................................... "    " 0 "       " 0 "       " 0 "     
[22,] "Rockfish, splitnose............................ "      " 11 "      " 4 "       " 159 "   
[23,] "Rockfish, starry................................. "    " 0 "       " 0 "       " 14 "    
[24,] "Rockfish, treefish............................... "    " 0 "       " 0 "       " 123 "   
[25,] "Rockfish, unspecified........................ "        " 47 "      " 139 "     " 1,301 " 
[26,] "Rockfish, vermilion............................ "      " 0 "       " 0 "       " 30 "    
[27,] "Rockfish, widow................................ "      " 0 "       " 0 "       " 1 "     
[28,] "Rockfish, yellowtail............................ "     " 0 "       " 31 "      " 0 "     
[29,] "Sablefish........................................... " " 1,877 "   " 1,096 "   " 2,737 " 
[30,] "Salmon, Chinook............................... "       " 0 "       " 0 "       " 0 "     
[31,] "Salmon.............................................. " " 0 "       " 0 "       " 0 "     
[32,] "Sanddab............................................ "  " 323 "     " 238 "     " 193 "   
[33,] "Sardine, Pacific................................. "    " 361,879 " " 966,738 " " 55,708 "
[34,] "Scorpionfish, California..................... "        " 377 "     " 97 "      " 1,931 " 
[35,] "Seabass, white.................................. "     " 3,628 "   " 2,411 "   " 1,973 " 
[36,] "Shark, Pacific angel.......................... "       " 6,880 "   " 5,894 "   " 3,243 " 
[37,] "Shark, basking.................................. "     " 0 "       " 0 "       " 0 "     
[38,] "Shark, bigeye thresher...................... "         " 0 "       " 0 "       " 0 "     
      [,5]        [,6]        [,7]        [,8]        [,9]          [,10]       [,11]      
 [1,] ""          ""          ""          ""          ""            ""          ""         
 [2,] ""          ""          ""          ""          ""            ""          ""         
 [3,] " 51 "      " 76 "      " 9 "       " 0 "       " 658 "       " 0 "       " 0 "      
 [4,] " 0 "       " 662 "     " 0 "       " 0 "       " 0 "         " 0 "       " 0 "      
 [5,] " 4 "       " 0 "       " 7 "       " 6 "       " 1,423 "     " 0 "       " 0 "      
 [6,] " 33 "      " 709 "     " 1,046 "   " 709 "     " 402 "       " 239 "     " 172 "    
 [7,] " 35 "      " 26 "      " 29 "      " 23 "      " 10 "        " 31 "      " 29 "     
 [8,] " 476 "     " 655 "     " 1,212 "   " 771 "     " 227 "       " 667 "     " 592 "    
 [9,] " 2,045 "   " 2,304 "   " 3,275 "   " 1,998 "   " 5,146 "     " 4,515 "   " 2,189 "  
[10,] " 0 "       " 29 "      " 61 "      " 40 "      " 5 "         " 145 "     " 0 "      
[11,] " 0 "       " 0 "       " 0 "       " 0 "       " 0 "         " 0 "       " 0 "      
[12,] " 19 "      " 73 "      " 103 "     " 153 "     " 203 "       " 113 "     " 0 "      
[13,] " 0 "       " 0 "       " 0 "       " 69 "      " 31 "        " 114 "     " 9 "      
[14,] " 1,148 "   " 879 "     " 1,434 "   " 802 "     " 385 "       " 323 "     " 843 "    
[15,] " 0 "       " 0 "       " 0 "       " 0 "       " 42 "        " 666 "     " 107 "    
[16,] " 0 "       " 28 "      " 66 "      " 152 "     " 156 "       " 139 "     " 0 "      
[17,] " 0 "       " 0 "       " 5 "       " 0 "       " 0 "         " 0 "       " 0 "      
[18,] " 7 "       " 14 "      " 0 "       " 0 "       " 0 "         " 0 "       " 0 "      
[19,] " 61 "      " 42 "      " 0 "       " 3 "       " 0 "         " 4 "       " 2 "      
[20,] " 0 "       " 0 "       " 0 "       " 15 "      " 0 "         " 0 "       " 0 "      
[21,] " 0 "       " 0 "       " 13 "      " 65 "      " 0 "         " 0 "       " 0 "      
[22,] " 120 "     " 30 "      " 39 "      " 207 "     " 69 "        " 211 "     " 37 "     
[23,] " 19 "      " 10 "      " 36 "      " 22 "      " 0 "         " 0 "       " 0 "      
[24,] " 169 "     " 214 "     " 463 "     " 559 "     " 142 "       " 266 "     " 62 "     
[25,] " 1,572 "   " 1,369 "   " 2,284 "   " 1,477 "   " 424 "       " 1,555 "   " 396 "    
[26,] " 27 "      " 28 "      " 85 "      " 58 "      " 0 "         " 319 "     " 172 "    
[27,] " 0 "       " 8 "       " 10 "      " 22 "      " 13 "        " 17 "      " 8 "      
[28,] " 0 "       " 0 "       " 0 "       " 0 "       " 8 "         " 0 "       " 9 "      
[29,] " 4,373 "   " 1,630 "   " 3,253 "   " 3,609 "   " 1,262 "     " 6,749 "   " 4,848 "  
[30,] " 0 "       " 220 "     " 0 "       " 0 "       " 0 "         " 860 "     " 0 "      
[31,] " 0 "       " 120 "     " 0 "       " 0 "       " 0 "         " 0 "       " 0 "      
[32,] " 403 "     " 80 "      " 10 "      " 225 "     " 2 "         " 0 "       " 109 "    
[33,] " 300,734 " " 523,580 " " 685,822 " " 283,577 " " 1,408,037 " " 345,353 " " 579,506 "
[34,] " 498 "     " 730 "     " 94 "      " 282 "     " 778 "       " 287 "     " 1,492 "  
[35,] " 90 "      " 369 "     " 24,713 "  " 21,245 "  " 4,381 "     " 1,194 "   " 4,614 "  
[36,] " 1,786 "   " 1,309 "   " 2,157 "   " 618 "     " 758 "       " 777 "     " 3,431 "  
[37,] " 0 "       " 76 "      " 0 "       " 0 "       " 0 "         " 0 "       " 0 "      
[38,] " 0 "       " 0 "       " 358 "     " 0 "       " 0 "         " 0 "       " 0 "      
      [,12]         [,13]       [,14]       
 [1,] ""            ""          ""          
 [2,] ""            ""          ""          
 [3,] " 0 "         " 0 "       " 811"      
 [4,] " 0 "         " 0 "       " 669"      
 [5,] " 0 "         " 0 "       " 1,468"    
 [6,] " 166 "       " 980 "     " 4,732"    
 [7,] " 4 "         " 5 "       " 191"      
 [8,] " 259 "       " 1,342 "   " 6,498"    
 [9,] " 1,653 "     " 2,494 "   " 26,798"   
[10,] " 123 "       " 176 "     " 581"      
[11,] " 18 "        " 5 "       " 23"       
[12,] " 77 "        " 112 "     " 899"      
[13,] " 0 "         " 0 "       " 378"      
[14,] " 865 "       " 2,529 "   " 9,984"    
[15,] " 0 "         " 0 "       " 815"      
[16,] " 0 "         " 0 "       " 626"      
[17,] " 0 "         " 0 "       " 5"        
[18,] " 0 "         " 0 "       " 21"       
[19,] " 0 "         " 18 "      " 192"      
[20,] " 0 "         " 0 "       " 15"       
[21,] " 0 "         " 0 "       " 78"       
[22,] " 0 "         " 40 "      " 927"      
[23,] " 0 "         " 0 "       " 101"      
[24,] " 22 "        " 104 "     " 2,124"    
[25,] " 987 "       " 1,993 "   " 13,545"   
[26,] " 0 "         " 14 "      " 733"      
[27,] " 0 "         " 8 "       " 87"       
[28,] " 3 "         " 12 "      " 64"       
[29,] " 3,463 "     " 5,494 "   " 40,390"   
[30,] " 0 "         " 0 "       " 1,080"    
[31,] " 0 "         " 0 "       " 120"      
[32,] " 0 "         " 43 "      " 1,625"    
[33,] " 1,065,972 " " 248,851 " " 6,825,757"
[34,] " 2,201 "     " 6,195 "   " 14,962"   
[35,] " 3,420 "     " 1,509 "   " 69,546"   
[36,] " 1,045 "     " 5,244 "   " 33,142"   
[37,] " 0 "         " 0 "       " 76"       
[38,] " 0 "         " 0 "       " 358"      

[[3]]
      [,1]                                                     [,2]       [,3]      [,4]     
 [1,] "alifornia Waters"                                       ""         ""        ""       
 [2,] "Fishes"                                                 ""         ""        ""       
 [3,] "Shark, blue........................................ "   " 0 "      " 0 "     " 0 "    
 [4,] "Shark, brown smoothhound............... "               " 285 "    " 174 "   " 431 "  
 [5,] "Shark, dusky..................................... "     " 0 "      " 0 "     " 371 "  
 [6,] "Shark, gray smoothhound................. "              " 0 "      " 0 "     " 0 "    
 [7,] "Shark, horn....................................... "    " 0 "      " 21 "    " 0 "    
 [8,] "Shark, leopard................................... "     " 324 "    " 512 "   " 574 "  
 [9,] "Shark, pelagic thresher..................... "          " 0 "      " 0 "     " 0 "    
[10,] "Shark, sevengill................................. "     " 4 "      " 0 "     " 7 "    
[11,] "Shark, shortfin mako......................... "         " 174 "    " 20 "    " 15 "   
[12,] "Shark, sixgill...................................... "  " 0 "      " 0 "     " 0 "    
[13,] "Shark, smooth hammerhead............. "                 " 0 "      " 0 "     " 0 "    
[14,] "Shark, soupfin................................... "     " 1,765 "  " 1,890 " " 823 "  
[15,] "Shark, spiny dogfish.......................... "        " 0 "      " 0 "     " 6 "    
[16,] "Shark, thresher.................................. "     " 18,674 " " 2,903 " " 86 "   
[17,] "Shark, unspecified............................. "       " 8 "      " 45 "    " 78 "   
[18,] "Shark, white...................................... "    " 0 "      " 0 "     " 0 "    
[19,] "Sheephead, California....................... "          " 4,326 "  " 1,010 " " 4,750 "
[20,] "Silversides......................................... "  " 0 "      " 0 "     " 0 "    
[21,] "Skate, California................................ "     " 1,779 "  " 0 "     " 0 "    
[22,] "Skate, unspecified............................. "       " 20 "     " 0 "     " 113 "  
[23,] "Smelts, true....................................... "   " 0 "      " 0 "     " 0 "    
[24,] "Sole, Dover....................................... "    " 0 "      " 0 "     " 544 "  
[25,] "Sole, English..................................... "    " 96 "     " 20 "    " 62 "   
[26,] "Sole, fantail....................................... "  " 0 "      " 0 "     " 0 "    
[27,] "Sole, petrale...................................... "   " 6 "      " 369 "   " 82 "   
[28,] "Sole, rex............................................ " " 0 "      " 0 "     " 5 "    
[29,] "Sole, unspecified............................... "      " 372 "    " 620 "   " 1,206 "
[30,] "Surfperch, black................................ "      " 0 "      " 0 "     " 0 "    
[31,] "Surfperch, rainbow............................ "        " 4 "      " 0 "     " 0 "    
[32,] "Surfperch, rubberlip........................... "       " 0 "      " 2 "     " 0 "    
[33,] "Surfperch, unspecified....................... "         " 2 "      " 5 "     " 44 "   
[34,] "Swordfish.......................................... "   " 2,126 "  " 0 "     " 0 "    
[35,] "Thornyhead, longspine...................... "           " 1,823 "  " 479 "   " 1,666 "
[36,] "Thornyhead, shortspine..................... "           " 3,811 "  " 2,807 " " 6,480 "
[37,] "Thornyheads..................................... "      " 1,520 "  " 250 "   " 1,096 "
[38,] "Tomcod, Pacific................................ "       " 0 "      " 0 "     " 0 "    
      [,5]      [,6]      [,7]      [,8]      [,9]      [,10]     [,11]     [,12]      [,13]    
 [1,] ""        ""        ""        ""        ""        ""        ""        ""         ""       
 [2,] ""        ""        ""        ""        ""        ""        ""        ""         ""       
 [3,] " 73 "    " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
 [4,] " 111 "   " 107 "   " 151 "   " 51 "    " 31 "    " 95 "    " 131 "   " 1,273 "  " 472 "  
 [5,] " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
 [6,] " 0 "     " 0 "     " 0 "     " 3 "     " 7 "     " 0 "     " 41 "    " 0 "      " 0 "    
 [7,] " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
 [8,] " 701 "   " 893 "   " 520 "   " 674 "   " 219 "   " 130 "   " 194 "   " 259 "    " 191 "  
 [9,] " 0 "     " 18 "    " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[10,] " 0 "     " 7 "     " 0 "     " 0 "     " 10 "    " 0 "     " 0 "     " 0 "      " 0 "    
[11,] " 182 "   " 0 "     " 621 "   " 1,552 " " 82 "    " 1,116 " " 2,080 " " 1,312 "  " 193 "  
[12,] " 0 "     " 0 "     " 0 "     " 15 "    " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[13,] " 0 "     " 0 "     " 50 "    " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[14,] " 1,427 " " 631 "   " 3,236 " " 4,795 " " 3,847 " " 997 "   " 149 "   " 677 "    " 1,401 "
[15,] " 21 "    " 47 "    " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[16,] " 393 "   " 2,692 " " 5,839 " " 4,706 " " 4,829 " " 6,813 " " 3,770 " " 21,337 " " 3,282 "
[17,] " 18 "    " 0 "     " 7 "     " 0 "     " 0 "     " 0 "     " 6 "     " 46 "     " 55 "   
[18,] " 0 "     " 0 "     " 88 "    " 61 "    " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[19,] " 5,775 " " 3,357 " " 7,696 " " 7,458 " " 5,503 " " 9,308 " " 4,274 " " 2,036 "  " 3,144 "
[20,] " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 5 "     " 0 "     " 0 "      " 0 "    
[21,] " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[22,] " 698 "   " 95 "    " 0 "     " 3 "     " 385 "   " 234 "   " 0 "     " 146 "    " 175 "  
[23,] " 0 "     " 0 "     " 9 "     " 0 "     " 2 "     " 0 "     " 0 "     " 0 "      " 0 "    
[24,] " 97 "    " 130 "   " 12 "    " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[25,] " 167 "   " 116 "   " 0 "     " 0 "     " 4,732 " " 0 "     " 0 "     " 112 "    " 244 "  
[26,] " 65 "    " 164 "   " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[27,] " 259 "   " 76 "    " 30 "    " 0 "     " 3,686 " " 0 "     " 0 "     " 434 "    " 509 "  
[28,] " 0 "     " 0 "     " 8 "     " 0 "     " 0 "     " 2 "     " 120 "   " 0 "      " 0 "    
[29,] " 1,610 " " 809 "   " 292 "   " 366 "   " 444 "   " 836 "   " 2,988 " " 2,467 "  " 2,097 "
[30,] " 173 "   " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[31,] " 1 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[32,] " 1 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "      " 0 "    
[33,] " 44 "    " 0 "     " 5 "     " 0 "     " 5 "     " 0 "     " 0 "     " 0 "      " 0 "    
[34,] " 0 "     " 0 "     " 0 "     " 0 "     " 10 "    " 350 "   " 3,815 " " 11,157 " " 441 "  
[35,] " 3,566 " " 5,115 " " 4,317 " " 5,987 " " 1,598 " " 2,301 " " 2,600 " " 716 "    " 4,284 "
[36,] " 2,111 " " 6,945 " " 4,202 " " 6,217 " " 1,352 " " 699 "   " 420 "   " 17 "     " 1,414 "
[37,] " 2,278 " " 839 "   " 608 "   " 1,896 " " 669 "   " 266 "   " 2,115 " " 837 "    " 2,657 "
[38,] " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 0 "     " 41 "    " 0 "      " 0 "    
      [,14]    
 [1,] ""       
 [2,] ""       
 [3,] " 73"    
 [4,] " 3,311" 
 [5,] " 371"   
 [6,] " 51"    
 [7,] " 21"    
 [8,] " 5,189" 
 [9,] " 18"    
[10,] " 28"    
[11,] " 7,346" 
[12,] " 15"    
[13,] " 50"    
[14,] " 21,639"
[15,] " 74"    
[16,] " 75,324"
[17,] " 263"   
[18,] " 149"   
[19,] " 58,636"
[20,] " 5"     
[21,] " 1,779" 
[22,] " 1,868" 
[23,] " 11"    
[24,] " 783"   
[25,] " 5,549" 
[26,] " 229"   
[27,] " 5,452" 
[28,] " 135"   
[29,] " 14,106"
[30,] " 173"   
[31,] " 5"     
[32,] " 3"     
[33,] " 105"   
[34,] " 17,898"
[35,] " 34,452"
[36,] " 36,475"
[37,] " 15,030"
[38,] " 41"    

[[4]]
      [,1]                                                     [,2]           [,3]          
 [1,] "alifornia Waters"                                       ""             ""            
 [2,] "Fishes"                                                 ""             ""            
 [3,] "Tuna, albacore.................................. "      " 240 "        " 0 "         
 [4,] "Tuna, bluefin..................................... "    " 0 "          " 0 "         
 [5,] "Tuna, skipjack, black......................... "        " 0 "          " 0 "         
 [6,] "Tuna, skipjack................................... "     " 0 "          " 0 "         
 [7,] "Tuna, unspecified.............................. "       " 12 "         " 0 "         
 [8,] "Tuna, yellowfin.................................. "     " 0 "          " 0 "         
 [9,] "Whitefish, ocean................................ "      " 869 "        " 81 "        
[10,] "Whiting, Pacific................................. "     " 5 "          " 0 "         
[11,] "Yellowtail........................................... " " 203 "        " 34 "        
[12,] "Crustaceans"                                            ""             ""            
[13,] "Crab, box.......................................... "   " 0 "          " 19 "        
[14,] "Crab, claws....................................... "    " 62 "         " 20 "        
[15,] "Crab, king.......................................... "  " 0 "          " 0 "         
[16,] "Crab, pelagic red............................... "      " 0 "          " 0 "         
[17,] "Crab, red rock................................... "     " 0 "          " 0 "         
[18,] "Crab, rock unspecified....................... "         " 38,916 "     " 41,080 "    
[19,] "Crab, spider....................................... "   " 2,443 "      " 2,349 "     
[20,] "Crab, yellow rock............................... "      " 27 "         " 0 "         
[21,] "Crustacean, unspecified.................... "           " 0 "          " 0 "         
[22,] "Lobster, California spiny.................... "         " 28,391 "     " 25,224 "    
[23,] "Prawn, ridgeback.............................. "        " 250,974 "    " 130,744 "   
[24,] "Prawn, spot....................................... "    " 1,475 "      " 18,966 "    
[25,] "Shrimp, mantis.................................. "      " 0 "          " 0 "         
[26,] "Shrimp, ocean (pink).................... "              " 0 "          " 0 "         
[27,] "Echinoderms"                                            ""             ""            
[28,] "Sea cucumber, unspecified............... "              " 5,641 "      " 6,434 "     
[29,] "Sea urchin, purple............................. "       " 0 "          " 141 "       
[30,] "Sea urchin, red.................................. "     " 739,888 "    " 194,690 "   
[31,] "Mollusks"                                               ""             ""            
[32,] "Clam, unspecified.............................. "       " 0 "          " 0 "         
[33,] "Limpet, unspecified........................... "        " 0 "          " 0 "         
[34,] "Mussel.............................................. "  " 20 "         " 250 "       
[35,] "Octopus, unspecified......................... "         " 113 "        " 76 "        
[36,] "Snail, sea.......................................... "  " 38 "         " 0 "         
[37,] "Snails, moon..................................... "     " 0 "          " 0 "         
[38,] "Squid, jumbo..................................... "     " 3 "          " 0 "         
[39,] "Squid, market.................................... "     " 25,142,717 " " 14,809,871 "
      [,4]          [,5]          [,6]          [,7]          [,8]          [,9]       
 [1,] ""            ""            ""            ""            ""            ""         
 [2,] ""            ""            ""            ""            ""            ""         
 [3,] " 0 "         " 0 "         " 0 "         " 0 "         " 157 "       " 0 "      
 [4,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
 [5,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
 [6,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
 [7,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
 [8,] " 0 "         " 0 "         " 0 "         " 16 "        " 218 "       " 0 "      
 [9,] " 886 "       " 778 "       " 363 "       " 591 "       " 608 "       " 390 "    
[10,] " 20 "        " 0 "         " 0 "         " 36 "        " 0 "         " 0 "      
[11,] " 0 "         " 35 "        " 903 "       " 1,306 "     " 259 "       " 314 "    
[12,] ""            ""            ""            ""            ""            ""         
[13,] " 5 "         " 32 "        " 37 "        " 10 "        " 5 "         " 53 "     
[14,] " 40 "        " 111 "       " 92 "        " 98 "        " 40 "        " 170 "    
[15,] " 27 "        " 213 "       " 154 "       " 197 "       " 173 "       " 268 "    
[16,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[17,] " 0 "         " 0 "         " 0 "         " 91 "        " 0 "         " 0 "      
[18,] " 61,751 "    " 53,411 "    " 52,238 "    " 61,279 "    " 61,320 "    " 56,422 " 
[19,] " 2,466 "     " 591 "       " 1,092 "     " 1,518 "     " 2,896 "     " 2,937 "  
[20,] " 120 "       " 70 "        " 0 "         " 0 "         " 0 "         " 3 "      
[21,] " 0 "         " 33 "        " 0 "         " 0 "         " 0 "         " 0 "      
[22,] " 16,235 "    " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[23,] " 155,996 "   " 173,895 "   " 151,542 "   " 1,385 "     " 873 "       " 212 "    
[24,] " 12,853 "    " 7,353 "     " 14,065 "    " 19,152 "    " 7,475 "     " 12,107 " 
[25,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[26,] " 0 "         " 1,299 "     " 0 "         " 83 "        " 1,021 "     " 570 "    
[27,] ""            ""            ""            ""            ""            ""         
[28,] " 61,977 "    " 67,360 "    " 62,522 "    " 183,514 "   " 75,669 "    " 62,765 " 
[29,] " 0 "         " 1 "         " 0 "         " 0 "         " 0 "         " 0 "      
[30,] " 449,937 "   " 367,268 "   " 174,820 "   " 239,998 "   " 121,832 "   " 388,896 "
[31,] ""            ""            ""            ""            ""            ""         
[32,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[33,] " 98 "        " 0 "         " 4 "         " 0 "         " 0 "         " 0 "      
[34,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[35,] " 21 "        " 31 "        " 26 "        " 35 "        " 37 "        " 95 "     
[36,] " 0 "         " 0 "         " 0 "         " 7 "         " 0 "         " 0 "      
[37,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[38,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "      
[39,] " 3,272,556 " " 7,585,263 " " 5,147,890 " " 2,685,152 " " 1,631,741 " " 60 "     
      [,10]         [,11]          [,12]          [,13]          [,14]         
 [1,] ""            ""             ""             ""             ""            
 [2,] ""            ""             ""             ""             ""            
 [3,] " 5,335 "     " 2,152 "      " 2,956 "      " 0 "          " 10,840"     
 [4,] " 969 "       " 221 "        " 666 "        " 0 "          " 1,856"      
 [5,] " 0 "         " 48 "         " 0 "          " 0 "          " 48"         
 [6,] " 0 "         " 15 "         " 0 "          " 0 "          " 15"         
 [7,] " 0 "         " 0 "          " 0 "          " 0 "          " 12"         
 [8,] " 0 "         " 0 "          " 0 "          " 0 "          " 234"        
 [9,] " 854 "       " 937 "        " 191 "        " 433 "        " 6,981"      
[10,] " 0 "         " 0 "          " 0 "          " 0 "          " 61"         
[11,] " 925 "       " 2,331 "      " 313 "        " 14 "         " 6,636"      
[12,] ""            ""             ""             ""             ""            
[13,] " 0 "         " 27 "         " 7 "          " 16 "         " 211"        
[14,] " 2 "         " 0 "          " 38 "         " 100 "        " 773"        
[15,] " 167 "       " 0 "          " 0 "          " 0 "          " 1,198"      
[16,] " 0 "         " 10 "         " 0 "          " 0 "          " 10"         
[17,] " 0 "         " 0 "          " 0 "          " 0 "          " 91"         
[18,] " 59,754 "    " 42,606 "     " 29,844 "     " 38,288 "     " 596,908"    
[19,] " 2,896 "     " 1,714 "      " 2,135 "      " 1,650 "      " 24,687"     
[20,] " 0 "         " 1,118 "      " 0 "          " 0 "          " 1,338"      
[21,] " 0 "         " 2 "          " 0 "          " 0 "          " 35"         
[22,] " 0 "         " 58,935 "     " 40,474 "     " 34,789 "     " 204,048"    
[23,] " 785 "       " 259,772 "    " 175,907 "    " 123,541 "    " 1,425,626"  
[24,] " 15,158 "    " 12,142 "     " 133 "        " 0 "          " 120,878"    
[25,] " 0 "         " 2 "          " 0 "          " 26 "         " 28"         
[26,] " 41 "        " 0 "          " 0 "          " 0 "          " 3,013"      
[27,] ""            ""             ""             ""             ""            
[28,] " 13,689 "    " 627 "        " 2,203 "      " 3,245 "      " 545,646"    
[29,] " 0 "         " 3,895 "      " 2,854 "      " 627 "        " 7,518"      
[30,] " 447,407 "   " 572,460 "    " 507,811 "    " 514,112 "    " 4,719,120"  
[31,] ""            ""             ""             ""             ""            
[32,] " 0 "         " 0 "          " 0 "          " 31 "         " 31"         
[33,] " 0 "         " 13 "         " 0 "          " 0 "          " 114"        
[34,] " 50 "        " 0 "          " 0 "          " 0 "          " 320"        
[35,] " 30 "        " 75 "         " 22 "         " 25 "         " 585"        
[36,] " 30 "        " 0 "          " 0 "          " 0 "          " 75"         
[37,] " 304 "       " 50 "         " 70 "         " 0 "          " 424"        
[38,] " 0 "         " 0 "          " 0 "          " 0 "          " 3"          
[39,] " 2,041,590 " " 11,331,996 " " 34,792,518 " " 39,050,200 " " 147,491,553"

[[5]]
      [,1]                                                       [,2]           [,3]          
 [1,] "alifornia Waters"                                         ""             ""            
 [2,] "Mollusks"                                                 ""             ""            
 [3,] "Whelk, Kellet's................................... "      " 1,417 "      " 1,186 "     
 [4,] "Waters Area Total: "                                      " 27,795,893 " " 17,662,015 "
 [5,] "ther Waters"                                              ""             ""            
 [6,] "Fishes"                                                   ""             ""            
 [7,] "Dolphin (fish)................................ "          " 0 "          " 0 "         
 [8,] "Escolar.............................................. "   " 68 "         " 65 "        
 [9,] "Oilfish................................................ " " 0 "          " 0 "         
[10,] "Scorpionfish, California..................... "           " 0 "          " 0 "         
[11,] "Shark, shortfin mako......................... "           " 480 "        " 0 "         
[12,] "Shark, thresher.................................. "       " 0 "          " 0 "         
[13,] "Sheephead, California....................... "            " 0 "          " 0 "         
[14,] "Swordfish.......................................... "     " 15,100 "     " 10,227 "    
[15,] "Tuna, albacore.................................. "        " 3,072 "      " 253 "       
[16,] "Tuna, bigeye..................................... "       " 6,674 "      " 2,865 "     
[17,] "Tuna, bluefin..................................... "      " 1,585 "      " 256 "       
[18,] "Tuna, yellowfin.................................. "       " 0 "          " 0 "         
[19,] "Wahoo............................................... "    " 0 "          " 0 "         
[20,] "Crustaceans"                                              ""             ""            
[21,] "Lobster, California spiny.................... "           " 0 "          " 0 "         
[22,] "Prawn, ridgeback.............................. "          " 0 "          " 0 "         
[23,] "Waters Area Total: "                                      " 26,979 "     " 13,666 "    
[24,] "Grand Total: "                                            " 27,822,872 " " 17,675,681 "
      [,4]          [,5]          [,6]          [,7]          [,8]          [,9]         
 [1,] ""            ""            ""            ""            ""            ""           
 [2,] ""            ""            ""            ""            ""            ""           
 [3,] " 2,353 "     " 1,811 "     " 1,640 "     " 1,616 "     " 1,734 "     " 3,915 "    
 [4,] " 4,364,537 " " 8,923,809 " " 6,905,513 " " 5,318,798 " " 3,087,396 " " 2,954,437 "
 [5,] ""            ""            ""            ""            ""            ""           
 [6,] ""            ""            ""            ""            ""            ""           
 [7,] " 0 "         " 0 "         " 897 "       " 0 "         " 590 "       " 0 "        
 [8,] " 0 "         " 0 "         " 338 "       " 0 "         " 0 "         " 0 "        
 [9,] " 0 "         " 0 "         " 0 "         " 0 "         " 55 "        " 0 "        
[10,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "        
[11,] " 1,289 "     " 0 "         " 15 "        " 0 "         " 127 "       " 0 "        
[12,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "        
[13,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "        
[14,] " 10,750 "    " 0 "         " 6,813 "     " 0 "         " 1,690 "     " 0 "        
[15,] " 0 "         " 0 "         " 145 "       " 0 "         " 8,080 "     " 0 "        
[16,] " 1,875 "     " 0 "         " 3,296 "     " 0 "         " 656 "       " 0 "        
[17,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "        
[18,] " 369 "       " 0 "         " 29 "        " 0 "         " 35 "        " 0 "        
[19,] " 0 "         " 0 "         " 0 "         " 0 "         " 27 "        " 0 "        
[20,] ""            ""            ""            ""            ""            ""           
[21,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "        
[22,] " 0 "         " 0 "         " 0 "         " 0 "         " 0 "         " 0 "        
[23,] " 14,283 "    " 0 "         " 11,533 "    " 0 "         " 11,260 "    " 0 "        
[24,] " 4,378,820 " " 8,923,809 " " 6,917,046 " " 5,318,798 " " 3,098,656 " " 2,954,437 "
      [,10]         [,11]          [,12]          [,13]          [,14]         
 [1,] ""            ""             ""             ""             ""            
 [2,] ""            ""             ""             ""             ""            
 [3,] " 2,888 "     " 1,033 "      " 2,745 "      " 4,916 "      " 27,253"     
 [4,] " 3,429,525 " " 13,290,556 " " 37,357,055 " " 40,243,170 " " 171,332,704"
 [5,] ""            ""             ""             ""             ""            
 [6,] ""            ""             ""             ""             ""            
 [7,] " 0 "         " 1,576 "      " 0 "          " 0 "          " 3,063"      
 [8,] " 0 "         " 90 "         " 0 "          " 120 "        " 681"        
 [9,] " 0 "         " 0 "          " 0 "          " 300 "        " 355"        
[10,] " 0 "         " 0 "          " 7 "          " 0 "          " 7"          
[11,] " 0 "         " 682 "        " 0 "          " 510 "        " 3,103"      
[12,] " 20 "        " 0 "          " 0 "          " 0 "          " 20"         
[13,] " 0 "         " 0 "          " 10 "         " 0 "          " 10"         
[14,] " 0 "         " 12,333 "     " 0 "          " 8,108 "      " 65,021"     
[15,] " 0 "         " 1,006 "      " 0 "          " 1,814 "      " 14,370"     
[16,] " 0 "         " 1,243 "      " 0 "          " 981 "        " 17,590"     
[17,] " 0 "         " 186 "        " 0 "          " 0 "          " 2,027"      
[18,] " 0 "         " 0 "          " 0 "          " 0 "          " 433"        
[19,] " 0 "         " 0 "          " 0 "          " 0 "          " 27"         
[20,] ""            ""             ""             ""             ""            
[21,] " 0 "         " 0 "          " 58 "         " 0 "          " 58"         
[22,] " 0 "         " 838 "        " 0 "          " 0 "          " 838"        
[23,] " 20 "        " 17,954 "     " 75 "         " 11,833 "     " 107,603"    
[24,] " 3,429,545 " " 13,308,510 " " 37,357,130 " " 40,255,003 " " 171,440,307"
x
View(x)
Q
---
title: "ahnold_labbook"
author: "Dan Ovando"
date: "December 14, 2016"
output: 
  html_notebook:
    toc: yes
    toc_float: yes
---

```{r libraries, echo=F, message=F, warning=F, include=F}
library(tidyverse)
library(forcats)
library(rstanarm)
library(lubridate)
library(ggsci)
library(broom)
library(car)
library(stringr)
library(rvest)

demons::load_functions(func_dir = '../functions')

```

A running labbook for ideas in the Ahnold project

## Bioregions

Looking at @Hamilton2010, bioregions clearly play a role. You could include them simply as covariates. But, that doesn't account for the potential interactions between MPAs and regions. Might need to either run them as seperate models (worth it to see), with interaction effects with MPA to see if there are differences among bioregions in MPA effects

## Interspecies Interactions

Can you look at the spatial-covariance matrix of these species?

Or is there enough data to sugihara the damn thing, and ask whether some species seem to cause decreases in others?

## Landings data

*[Can be found here](https://www.wildlife.ca.gov/Fishing/Commercial/Landings#260041375-2015)
* Can pretty easily figure out a way to scrape all this


## Fixed effects, random effects, and errors in variables, and clustering

It's important that you get this straightened out. 

The "errors in variables" relates to the idea that your covariate data are now random draws from some distribution. So, for example, you observe length data, which you fit to a model that predicts length data and a standard deviation, and you pass those predicted length data to some other part of the model. See box 6.2.2 in bayesian primer. This is just basically saying as another random variable in the liklihood, instead of taking it as given (where you would just use the length data)

Basically, what I'm trying to figure out here is how to properly deal with the the clustering of the data in the time series. So, for any categorical variable that is at a higher level than the data, it seems reasonable to just give it a prior (e.g. all the year coefficients share a common prior), all the species effects share a prior by group, etc, where all of those share a common prior. 

What do you do about things like linf though? Repeated a whole ton over time, but you can't just give it a hierarchical prior.... Looking at Gelman and Hill


OOOOOOOHHHHHH, maybe this is the right approach. Give it species fixed effects, and then make the species fixed effects a regression of species effects coefficients on life history traits. So, a multi-level modeling approach, like Gelman and Hill page 241. 

From a hierarchichal perspective then, the idea is now to think of these as a bunch of nested regressions for each level of the data, with the right random effects linking each layer together. 

So, you have year terms, and those year terms are a function of yearly things like temperature and el niño, species terms, etc. The problem that you've been having, and this probably explains some of the convergance issues, is that you have massive colinearity when you try and include fixed effects for things like species, and then the covariates at the species level? So, this allows you to sort of get the best of both worlds, where you now get the "intercepts" by the right thing, but those intercepts take into account the data at the intercept level that you have that can influence the outcome. So it looks though from Gelman and Hill page 281 that to do this, you don't have to include those terms in the mixed effects part of the regression itself, but only put them in, but with the appropriate priors. So, that's where the common prior comes in? You'll just need to mess with this one with code once you can code it in STAN

You should test this by getting the thing written in STAN, and comparing two versions, one where you do it the usual way (estimate the mean of the prior), one where you include covariates in the model, the other where you fit them seperately

  AHA@@!!@!#@!@#! THAT'S WHY THE MEAN OF THE PRIOR IS 0!!!! think about it. Say you want to include fixed effects for each site. Under the model as it get's written all over the place here, they are distributed with mean(u,sigma). Now, when you include the fixed effects mean in that context: those are deviations from a mean 0! So, you've already accounted for the change from the mean effect by including the intercept. If you then say include site specific temperature in there, you'd just chuck it in as a variable, which would do the exact same thing as saying N(site + temp,sigma), or site + temp + n(0,sigma). All those terms are getting soaked up in the intercept. Suppose that it should be N(.2,sigma). So, no matter what, ever year term is going to be .2 +- some term. That is the same then as just adding .2 to the intercept, and then setting the mean of the prior to 0. So, they real key here then is simply specifying the group-level standard deviation. Wow. 



**Everything is a random effect in bayes, the key is is it grouped**

## Model Structure

Let's go through the thing from scratch and think through how you want this thing to look, from a model diagramming perspective. 

### Hurdle component

### Data

So may be you need to be thinking about this kind of like a CPUE standardization, where on one hand you're fitting a poisson distribution to the transect data, and using that to estimate densities, based on deterministic equations and a sigma (or maybe treating that part as data to make things a bit easier), and then fitting the regression to those densities. The challenging part is how you rationalize the two distributions for density; that implied by the regressin and that implied by the error from the logit model 

AHA, so the key here is thinking about the marginal posterior of the regression coefficients, not just the regression coefficients. The sigma of the regression model us saying taking those data as observations from a ranfom event, how well does the model explain that. But, that's just under that one scenario. So, if you were doing this in a simple way where the posterior is analytical, then you get the betas out. But, the full marginal now would take into account the range of worlds that the densities could live in, so you'd sample from the joint posterior to get the betas of interest!

### Process

See box 6.2.2 in a Bayesian Primer for ideas on this. 

The way I'm leaning towards is something like this: There is a true biomass that is a random draw from a distribution with mean observed biomass and a variance defined by.... something (observer, group, a function of kelp, vis, etc.). The inverse of that, that the observed are a random draw from some true distribution with mean true and sigma above seems hard to do in this context: you'd have to fit a massive number, or possibly an unidentifiable number, of means, unless you want to say that that there's some clustering (e.g. mean by species by site by year, which I think are the raw aggregation anyways). 

Ah wait, you're making this wayyyy to complicated. The thing you're thinking about here are errors in **predictor** variables, not dependent variables like the density, which is what you're trying to explain. Under a traditional Bayesian framework, the dependent variabels are already considered random draws (until they are obsered), while the rest are considered data. But, the stuff you were thinking of there are for independent variables, things like temperature. So, you could say that temperature is imperfectyl measured, and comes from a distribution with some true mean and sigma. 

So, you don't need to mess with the "errors" in density, since it's already considered a random draw, from a distribution with mean expeced density from the model itself, and a sigma estiamted by the model. 

Now, where things could then get interesting is using a model to estimate the sigmas. i.e. sigma is a linear function of say a constant, plus kelp cover + vis + research group... etc. 

Now, that's all assuming that the biomass densities are in fact your data. There's another way you could do this, somewhat similar to @Karnauskas2011. The idea here. you have **two**ish data sources. One are the actual lengths. The other are the observed densities. The goal is still to fit a regression to the densities. The issue now though is that you're going to fit the regression to densities estimated from the raw length data. So, you'd take the length data. You'd then use that length data to build up a model of densities that you then fit the regression on. The problem is, you'd need a model relating something back to the length data, which you aren't really doing here. i.e what would the lengths be conditional on? Alternatively, you treat the lengths as data, and the densities as random variables. Now, you have the observed densities, that are draws from a distribution with mean of predicted density, where predicted density is a function of the lengths? That is a bit of double dipping though. Now one option could be to train the regression on "density" data generated from the lengths. So now, you say the observed densities are drawn from a distribution predicted by the regression, where the regression is trained on the expected densities predicted from the length data, estimating sigmas all the way. This seems maybe the most robust option, but also a royal pain in the ass. And do you really gain a whole lot from that?

It seems like you have something like three different model structures all models can share a common hierarchical nature to the data, the question really is how do you want to model errors, especially conditional on what you really care about are doing a good job on the estimators, not on the out-of-sample prediction

1. The standard model, with potential for clustering of errors in densities as a function of appropriate levels (e.g. species groups, etc. ).That's really what equation 6.2.51 in BP is doing: making a more complex model of the error structure associated with the observations. 

2. A model for standard deviations in the densities, where sigma is a linear function of an intercept and apprpriate covariates (like visibility, kelp, etc.) Could be an interesting way to test things that need to be accounted for in the model. That's really what equation 6.2.51 in BP is doing: making a more complex model of the error structure associated with the observations. 

3. A data-generation model, where the length frequency are taken as perfectly observed, or are used to generate draws from a poisson, a la @Karnauskas2011, converted to densities, and then the regression is fit to those densities, and then used to predict the observed densities. Seems like a worse and worse idea, especially if you're not as interested in prediction as you are in estimation.

### Parameters

Let's think about the covariates first. The question is basically how do you want to cluster the errors. 
#### Sites

You have site specific parameters, like location, region, etc. These stay constant at each site over time. Those should almost certainly be modeled as random effects, where maybe each site gets an intercept, drawn from the region that it's in, and each region then get's an uninformative prior, since the region's aren't really random effects, but constant enities in this world.

Now one question I have then is how does that fit in a regression framework, i.e. relative to "controlling" for the region effect in the regression itself. I don't think that it matters, you just need to stop thinking abotu things so linearly. You're only goal is to model the variation in density as a function of things, and this goes in there, the effect of region just isn't a linear additive term, but rather affects the site specific terms

### Species

You also have a bunch of species-specific effects. Now, on one level, these make more sense to me as fixed effects (the idea is that they aren't really random draws from a population, but rather "true" values). But, where I get confused then is how to deal with these correctly in a panel framework, given that they are repeated. If I remember my Gelman, the way to deal with this in the year terms for example is by giving them a hierarchichal nature. 

So, maybe one way to think of all the life history things is that each life history value is a draw from a "random" model that genrates life history traits. So, you estimate the coefficients for linf, where linf is a random variable drawn a global distribution. 

AHA, i think your confusion is in differentiating the data from the coefficients. I think the "random effects" vs "fixed effects" question relates to how you want to model the data itself, as either the data, or draws from some distribution that you then estimate as well. 

"In Bayesian hierarchical modeling, random effects are used to describe variation that occurs beyond variation that arises from sampling alone"

Hobbs, N. Thompson; Hooten, Mevin B.. Bayesian Models: A Statistical Primer for Ecologists (Page 114). Princeton University Press. Kindle Edition. 


### Priors


## Jan 3 2017

### New thoughts on model structure

OK, after doing some reading I think I'm geting a better handle on this. The key question: how do you properly account for the sampling nature of the MLPA data


The problem is you've been thinking about this rather confusedly between the unclear distinction between hierarchical and state space (see your evernore entry on state space modeling). 

So let's start from the ground up. 

1. You observe samples of length data, binned by size
2. These sample arrise from a underlying true length structure, let's say a Poisson, meaning that the mean and SD are the same. So, you fit a model that replicates this process WRONG!!!
    * The samples are the data! they are the mean values of the true poisson distribution. So, if that's the case, none of this matters ha, since there's nothing to estimate. 


If that's the case, maybe you go back to an explicit observation model, where the error in the length structures/biomass are a function of covariates. 



    
```{r}

library(tidyverse)

huh <- rpois(1000,100)

data_frame(huh = huh) %>% 
  ggplot(aes(huh)) + 
  geom_bar()

rmultinom(10, size = 12, prob = c(0.1,0.2,0.8,.1))

```

    
## 1/4/17

OK, that was a good road to head down, but not a great place to start. It seems like there's not really a huge upside to be had from the crazy length-up modeling process. At the end of the day, unless you introduce a bias term or something like that, you're just going to get the mean of the prediction back. So, you could certainly introduce a bit more uncertainty, but not really a substantive change in the outcome. 

```{r}
library(tidyverse)
dat <- read_csv('../data/UCSB_FISH raw thru 2013.csv')

dat <-  dat %>% 
  mutate(size_range = ifelse(is.na(max_tl - min_tl), 0, max_tl - min_tl ))

dat %>% 
  ggplot(aes(size_range)) +
  geom_histogram() + 
  scale_y_log10()

```
Almost all the observations have no real range to them. Looking at a histogram of the range of the sizes reported (max - min), where 0 corresponds to no range reported, almost all are within a few centimeters, which really isn't going to play our in the biomass all that much. only `r 100*mean(dat$size_range > 10)`% of samples have size ranges above 10 centimeters. So, to be really really precise, you could simulate length frequences for those, but really hardly seems worth the effort. 

The bigger question then is if you want to try and model bias in the observations; e.g. counts of certain species really should be higher/lower under different conditions. Something like a "standardized" survey index. i.e. maybe counts should be higher, taking into account poor vis or the like. Basically, it will be important if it introduces bias, or there is substantially more error around particular kinds of species. But beyond that, not like your analysis is fundamentally flawed or anything, and it's unclear how much benefit you get from controlling for those factors at the ground level, vs. controlling for them in the regression itself (i.e. all else being equal visibility drives down/up counts). 

There's still the biomass issue (converting lengths into biomass). There's obviously error in there, but probably not bias, so again, the issue would really be in just adding more noise. It's hard to know how you'd deal with that since there's not really a signal in the data to tell you anything about that error. It would be nice to work with Jen to do the translation from lengths to densities right there in the model, but again that seems like a second order thing: I don't think the existing densities are wrong or anything. The biggest thing though is that doing the densities raw would allow you to actually tally them up at the aggregation level you want, instead of taking means/medians across space and time, which could disguise some trends. Let's work on that actually. 

So, I think you're back go the important thing being doing a good job of dealing with the hierarchical nature of the data themselves. So let's spend the next few days focusing on that. 

## Data Exploration

Let's go back to square one and think a little about the nature of the data that you're dealing with, and explore the relationships of the data with density and length. 

```{r load data}


length_data <- read_csv('../data/UCSB_FISH raw thru 2013.csv') %>% 
    magrittr::set_colnames(.,tolower(colnames(.)))


conditions_data <- length_data %>% 
  group_by(site,side,year) %>% 
  summarise(mean_temp = mean(temp, na.rm = T),
            mean_kelp = mean(pctcnpy, na.rm = T),
            mean_vis = mean(vis, na.rm = T)) 
  

life_history_data <- read_csv('../data/VRG Fish Life History in MPA_04_08_11_12 11-Mar-2014.csv') %>%
  rename(classcode = pisco_classcode) %>% 
  magrittr::set_colnames(.,tolower(colnames(.)))

site_data <- read_csv('../data/Final_Site_Table_UCSB.csv') %>%
    magrittr::set_colnames(.,tolower(colnames(.)))


length_data <- length_data %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  left_join(site_data, by = c('site','side'))

```

## Converting length to biomass/density

One thing I'd like to be able to do is move directly from the lengths to the density at any aggregation that I'm interested in by actually tallying the length structure. As it stands right now, you need to take means/medians to aggregate densities over time, which I don't really like

Let's take a look at the actual density data that Jen provides

```{r density data}

density_data <- read_csv('../data/ci_reserve_data_final3 txt.csv') %>%
  magrittr::set_colnames(.,tolower(colnames(.))) %>% 
  gather('concat.name','value', grep('_',colnames(.)),convert = T) %>%
  mutate(data.type = gsub('\\_.*', '', concat.name),
         classcode = gsub('.*\\_','',concat.name)) %>%
  mutate(value = as.numeric(value)) %>%
  spread(data.type,value) %>%
  rename(site_side = site.side)

density_example <- density_data %>% 
  filter(is.na(biomass) == F & biomass >0) %>% 
  sample_n(1)

density_example

```
Let's take a look and see if you can replicate this density example. 





```{r}


length_to_weight <-
  function(mean_length,
  min_length,
  max_length,
  count,
  weight_a,
  weight_b,
  length_units = 'cm',
  weight_units = 'g',
  length_for_weight_units = 'mm',
  length_type_for_weight,
  tl_sl_a,
  tl_sl_b,
  tl_sl_type,
  tl_sl_formula) {
  #
  # generate_lengths <- function(count,mean_length, min_length, max_length){
  
  if (is.na(count) | count == 0){
    
    outweight <-  0
  }  else {
    
  if (is.na(min_length) |
  is.na(max_length)) {
  #generate distribution of lengths
  
  lengths <-  rep(mean_length, count)
  
  } else{
  # lengths <-  pmax(min_length,pmin(max_length,rpois(count, lambda = mean_length)))
  lengths <- runif(count, min = min_length, max = max_length)
  }

  if (length_type_for_weight == 'SL') {
  if (tl_sl_type  == 'TYPICAL') {
  weight_lengths <-  lengths * tl_sl_a + tl_sl_b
  } else{
  weight_lengths <- (lengths - tl_sl_b) / tl_sl_a
  
  }
  
  } else {
  weight_lengths <-  lengths
  }
  
  if (length_units == 'cm' & length_for_weight_units == 'mm') {
  weight_lengths <- weight_lengths * 10
  }
  
  weight <-  weight_a * weight_lengths ^ weight_b
  
  if (weight_units == 'kg') {
  weight <- weight * 1000
  }
  outweight = sum(weight)
  }
    return(outweight)
  } #close function
  
length_example <- length_data %>% 
  filter(classcode == toupper(density_example$classcode)
, site == density_example$site, 
         side == density_example$side, year == density_example$year) 

length_example <-   length_data %>% 
  filter(is.na(commonname) == F) %>% 
  mutate(biomass_g = pmap_dbl(list(mean_length = fish_tl,
                                      min_length = min_tl,
                                      max_length = max_tl,
                                      count = count,
                                      weight_a = wl_a,
                                      weight_b = wl_b,
                                      length_type_for_weight = wl_input_length,
                                      length_for_weight_units = wl_l_units,
                                      tl_sl_a = lc.a._for_wl,
                                      tl_sl_b = lc.b._for_wl,
                                      tl_sl_type = lc_type_for_wl,
                                      tl_sl_formula = ll_equation_for_wl), length_to_weight))

length_example %>% 
  select(biomass_g)

biomass_data <- length_example %>% 
  group_by(classcode, site, side, year, transect) %>% 
  summarise(total_biomass_g = sum(biomass_g)) 
```
First, need to add back in zeros. You need a function that goes through trip by trip, and adds in zeros for all species seen at that site at some point but not on that trip. 


```{r}

species_sightings <- length_data %>% 
  group_by(site) %>% 
  summarise(species_seen = list(unique(classcode)))


biomass_data <- biomass_data %>% 
  ungroup() %>% 
  select(site,side,year, transect) %>% 
  unique() %>%  {
  pmap(
    list(
      this_site = .$site,
      this_side = .$side,
      this_year = .$year,
      this_transect = .$transect
    ),
    add_missing_fish,
    observations = biomass_data,
    species_sightings = species_sightings
  )
} %>% 
  bind_rows()

man_density_data <- biomass_data %>% 
  group_by(classcode, site, side,year) %>% 
  summarise(mean_biomass_g = mean(total_biomass_g, na.rm = T)) %>% 
  mutate(
            biomass_g_per_m2 = mean_biomass_g / (30*4),
            biomass_g_per_hectare = biomass_g_per_m2 * 10000,
            biomass_ton_per_hectare = biomass_g_per_hectare * 1e-6)


```


```{r}


```



OK! You've got hand calculated densities now, let's compare them to jens

```{r}

density_comp_plot <- density_data %>% 
  select(classcode, site, side, year, biomass) %>% 
  mutate(classcode = toupper(classcode)) %>% 
  left_join(man_density_data, by = c('classcode','site','side','year')) %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  ggplot(aes(biomass,biomass_ton_per_hectare, color = commonname)) + 
             geom_abline(aes(intercept = 0, slope = 1), linetype= 2) +

           geom_point() + 
  scale_color_discrete(guide = F) + 
  labs( y = 'Biomass (tons per hectare) - calculated from lengths and weights', x = 'Biomass (tons per hectare) - from ci_reserve_data_final3 txt.csv', 
  caption = 'Resolution is at species-year-site-side')

density_comp_plot
ggsave(density_comp_plot, file = 'density comparison plot.pdf',dev = cairo_pdf)
# plotly::ggplotly(density_comp_plot)

```


Not Bad. It's a good starting point, and makes me confident that I'm understanding things right. You'll need to talk with Jen to figure out why this isn't 1:1. This also let's you compare outcomes under the two sources. 
## Exploring relationships

Let's dig into things a bit here and just look at trends in the (to start with) two versions of the database

```{r}

man_density_data <- man_density_data %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  left_join(site_data, by = c('site','side'))

man_density_data %>% 
  filter(is.na(targeted) == F) %>% 
  group_by(region, year, targeted) %>% 
  summarise(median_biomass = mean(biomass_ton_per_hectare, na.rm = T)) %>% 
  ggplot(aes(year,median_biomass, color = targeted)) + 
  geom_line() + 
  facet_wrap(~region)


```

Now same thing, but with Jen's data

```{r}
density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
  filter(is.na(targeted) == F) %>% 
  group_by(region, year, targeted) %>% 
  summarise(mean_biomass = mean(biomass, na.rm = T)) %>% 
  ggplot(aes(year,mean_biomass, color = targeted)) + 
  geom_line() + 
  facet_wrap(~region)
```



And just a really quick regression exploration

```{r}



reg_data <- density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
      left_join(conditions_data, by = c('site','side','year')) %>% 
  filter(is.na(targeted) == F) %>% 
  filter(biomass > 0) %>% 
  mutate(log_biomass = log(biomass),
         mlpa_in_effect = as.numeric(year > 2003),
         fished = as.numeric(targeted == 'Targeted'),
         did = as.numeric(fished * year * mlpa_in_effect))

reg_fmla <- as.formula('log_biomass ~  as.factor(year) + fished +  as.factor(did) + trophicgroup + vbgf.linf +
                       mean_temp')


basic_reg <- lm(reg_fmla, data = reg_data)

# stan_reg <- rstanarm::stan_glm(reg_fmla, data = reg_data)

summary(basic_reg)

broom::tidy(basic_reg)

  basic_reg %>%
    broom::tidy() %>%
    filter(str_detect(term,'did')) %>%
    mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    ggplot() +
    geom_hline(aes(yintercept = 0)) + 
    geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) +
    geom_smooth(aes(x = year, y = estimate), method = 'lm')

  
    # stan_reg %>%
    # broom::tidy() %>%
    # filter(str_detect(term,'did')) %>%
    # mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    # ggplot() +
    # geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) + 
    #       geom_smooth(aes(x = year, y = estimate), method = 'lm')



```


```{r}

reg_data <- man_density_data %>% 
  ungroup() %>% 
  filter(is.na(targeted) == F) %>% 
  filter(biomass_ton_per_hectare > 0) %>% 
  mutate(log_biomass = log(biomass_ton_per_hectare),
         mlpa_in_effect = as.numeric(year > 2003),
         fished = as.numeric(targeted == 'Targeted'),
         did = as.numeric(fished * year * mlpa_in_effect))

reg_fmla <- as.formula('log_biomass ~  as.factor(year) + fished +  as.factor(did) + trophicgroup + vbgf.linf')


basic_reg <- lm(reg_fmla, data = reg_data)

# stan_reg <- rstanarm::stan_glm(reg_fmla, data = reg_data)

# summary(basic_reg)
# 
# broom::tidy(basic_reg)

  basic_reg %>%
    broom::tidy() %>%
    filter(str_detect(term,'did')) %>%
    mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    ggplot() +
    geom_hline(aes(yintercept = 0)) + 
    geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) +
    geom_smooth(aes(x = year, y = estimate), method = 'lm')

  
    # stan_reg %>%
    # broom::tidy() %>%
    # filter(str_detect(term,'did')) %>%
    # mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    # ggplot() +
    # geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) + 
    #       geom_smooth(aes(x = year, y = estimate), method = 'lm')



```


Well that's not encouraging: this suggests that the density calculation really does matter here. You'll go with Jen's data for now, but big red flag of something that needs fixing here. 



## Are unfished and fished valid controls?

There are a few ways you could think about testing for this. 

1. Does the MLPA come out as a causal factor on the unfished species?

2. Do fished speices/unfished species cause each other (do lags of fished species predict unfished and vice versa)



```{r}
reg_data <- density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
  left_join(conditions_data, by = c('site','side','year')) %>% 
  filter(is.na(targeted) == F) %>% 
  filter(biomass > 0) %>% 
  mutate(log_biomass = log(biomass),
         mlpa_in_effect = as.numeric(year > 2003),
         unfished = as.numeric(targeted != 'Targeted'),
         did = as.numeric(unfished * year * mlpa_in_effect))

reg_fmla <- as.formula('log_biomass ~  as.factor(year) + mean_temp+unfished +  as.factor(did) + trophicgroup + vbgf.linf')


basic_reg <- lm(reg_fmla, data = reg_data)

# stan_reg <- rstanarm::stan_glm(reg_fmla, data = reg_data)

summary(basic_reg)

broom::tidy(basic_reg)

  basic_reg %>%
    broom::tidy() %>%
    filter(str_detect(term,'did')) %>%
    mutate(year = as.numeric(str_replace(term,'as.factor\\(did\\)','')) )%>%
    ggplot() +
    geom_hline(aes(yintercept = 0)) + 
    geom_pointrange(aes(x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96 * std.error)) +
    geom_smooth(aes(x = year, y = estimate), method = 'lm')


```

I don't think that that's a valid comparison, since you at least assume that the fished aren't a control. You could also just do a changepoint analysis?

Alternatively, let's think about this mechanistically. If you believe that the "targeted" classification is real, then by definition the MPA has no direct effect on the fished species. The real question then is whether there are trophic interactions that cause a problem. So, why not run a regression of unfished densities as a function of targeted abundance, controlling for other crap?

```{r, include=F, eval=F}

trophic_data <- density_data %>% 
  mutate(classcode = toupper(classcode)) %>% 
    left_join(life_history_data, by = 'classcode') %>% 
  left_join(temperature_data, by = c('site','side','year')) %>% 
  filter(is.na(targeted) == F) %>% 
  group_by(targeted, region, year) %>% 
  summarise(log_mean_biomass = log(mean(biomass, na.rm = T)),
                                mean_temp = mean(mean_temp, na.rm = T)) %>% 
  spread(targeted, log_mean_biomass) %>% 
  group_by(region) %>% 
  mutate(mb_lag1 = dplyr::lag(Targeted,1),
         mb_lag2 = dplyr::lag(Targeted, 2),
         mb_lag3 = dplyr::lag(Targeted, 3),
         mb_lag4 = dplyr::lag(Targeted, 4),
         temp_lag1 = dplyr::lag(mean_temp,1),
         temp_lag2 = dplyr::lag(mean_temp, 2),
         temp_lag3 = dplyr::lag(mean_temp, 3),
         temp_lag4 = dplyr::lag(mean_temp, 4)
         )

library(rstanarm)
trophic_reg <- stan_glmer(`Non-targeted` ~ Targeted + mb_lag1 + mb_lag2 + mb_lag3 + mb_lag4 + 
                    mean_temp + ( 1 |year) + temp_lag1 + temp_lag2 + temp_lag3 + temp_lag4, data = trophic_data )

# launch_shinystan(trophic_reg)

summary(trophic_reg)

post_interval <- trophic_reg %>% 
  posterior_interval()

post_vars <- row.names(post_interval)

post_interval <- post_interval %>% 
  as_data_frame() %>% 
  mutate(term = post_vars)

trophic_reg %>% 
  posterior_interval() %>% 
  as_data_frame(row.names = rownames(.))

reg_plot <- trophic_reg %>% 
  tidy() %>% 
  ggplot() + 
  geom_point(aes(term, estimate)) + 
  geom_errorbar(data = post_interval 
              , aes(term, ymin = `5%`,
                                                                 ymax = `95%`)) +
  coord_flip()

reg_plot

```

Interesting, some evidence of effect but it's messy, and depends a lot on model specification. 

## El Niño


```{r}

a = read_html('https://www.esrl.noaa.gov/psd/enso/mei/table.html') %>% 
  html_node('body') %>% 
  html_text()


enso <- read_lines('https://www.esrl.noaa.gov/psd/enso/mei/table.html')

enso <- enso[str_detect(enso,'\t|(YEAR)')] %>% 
  write('enso.txt')

enso <-  read.csv('enso.txt', sep = '\t', header = F)

table_names <- enso$V1[1] %>% 
  as.character() %>% 
  str_split(boundary('word'), simplify = T) %>% 
  as.character() %>% 
  tolower()

enso <- enso %>% 
  slice(-1) %>% 
  as_data_frame()

colnames(enso) <-  table_names

enso <- enso %>% 
  gather('bimonth','enso',-year)

enso %>% 
  mutate(year = year %>% as.character() %>% as.numeric()) %>% 
  group_by(year) %>% 
  summarise(mean_enso = mean(enso)) %>% 
  ggplot(aes(year, mean_enso)) + 
  geom_point()

 enso <- read_table("http://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data",
            na = c("-99.99", "99.99",'-99'), skip = 1, n_max = lubridate::year(Sys.time()) - 1870 + 1,
            col_names = c("year", 1:12)) %>%
 gather(month, enso, -year) %>%
 mutate(month = as.double(month))

```

## Get PDO

```{r}

 pdo <- read_table("https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/pdo.long.data",
            na = c("-99.99", "99.99",'-99'), skip = 1, n_max = lubridate::year(Sys.time()) - 1900,
            col_names = c("year", 1:12)) %>%
 gather(month, pdo, -year) %>%
 mutate(month = as.double(month),
        date = lubridate::ymd(paste(year,month,'01', sep = '-')))

pdo %>% 
  filter(year >= 2000) %>% 
  ggplot(aes(date,pdo)) +
  geom_hline(aes(yintercept = 0), linetype = 2) +
  geom_point()


```



## Causal defense ideas

How can you defend the causal nature of the results?

Include a "lead" on the DiD term: This is turned on in the years leading up to the policy, and says that the policy is going to happen. SHould be insignificant if the model is right. Look back at mostly harmless and Olivier's note

How well does out-of-sample prediction help you with causality? At it's face it gives evidence that your model is doing a good job of describing the data. But does it imply help with causality? Suppose you had a model that said predict if it's raining outside based on umbrellas. It's out of sample prediction would be great, but that doesn't mean that opening umbrellas is going to cause rain. 

## Regression Exploration

Goal of this section is to start really digging into the regressions.

### Data Exploration

Let's stick with Jen's data for now, but cognizant that you need to look into reproducing and sensitivities to transformation assumptions

Pulling in Jen's density data, merging some useful temperature, site, life history, enso, and PDO data

```{r load data again}

density_data <- read_csv('../data/ci_reserve_data_final3 txt.csv') %>%
  magrittr::set_colnames(.,tolower(colnames(.))) %>% 
  gather('concat.name','value', grep('_',colnames(.)),convert = T) %>%
  mutate(data.type = gsub('\\_.*', '', concat.name),
         classcode = gsub('.*\\_','',concat.name)) %>%
  mutate(value = as.numeric(value)) %>%
  spread(data.type,value) %>%
  rename(site_side = site.side)

length_data <- read_csv('../data/UCSB_FISH raw thru 2013.csv') %>% 
    magrittr::set_colnames(.,tolower(colnames(.)))

temperature_data <- length_data %>% 
  group_by(site,side,year) %>% 
  summarise(mean_temp = mean(temp, na.rm = T))

life_history_data <- read_csv('../data/VRG Fish Life History in MPA_04_08_11_12 11-Mar-2014.csv') %>%
  rename(classcode = pisco_classcode) %>%
  mutate(classcode = tolower(classcode)) %>% 
  magrittr::set_colnames(.,tolower(colnames(.)))

life_names <- c('classcode',colnames(life_history_data)[!colnames(life_history_data) %in% colnames(density_data)])

life_history_data <- life_history_data[ , life_names]

site_data <- read_csv('../data/Final_Site_Table_UCSB.csv') %>%
    magrittr::set_colnames(.,tolower(colnames(.)))

site_names <- c('site','side',colnames(site_data)[!colnames(site_data) %in% colnames(density_data)])

site_data <- site_data[,site_names]

enso <- read_csv('../data/enso.csv') %>% 
  group_by(year) %>% 
  summarise(mean_enso = mean(enso, na.rm = T))

pdo <- read_csv('../data/pdo.csv') %>% 
group_by(year) %>% 
  summarise(mean_pdo = mean(pdo, na.rm = T))

comp_data <- density_data %>% 
  left_join(temperature_data, by = c('site','side','year')) %>% 
  left_join(life_history_data, by = 'classcode') %>% 
  left_join(site_data, by = c('site','side')) %>% 
  left_join(enso, by = 'year') %>% 
  left_join(pdo, by = 'year')
  
```

### Data Composition

Let's look at the distribution of sample size over time; where are your data coming from?

```{r}

comp_data %>% 
  group_by(year) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs)) + 
  geom_point()
  
```

```{r}
comp_data %>% 
  group_by(year,targeted) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs, fill = targeted)) + 
  geom_bar(stat = 'identity')


```

Huh, so a bunch of the observations are "unknown" on targeting status. Let's look into this. Beyond that though, the targeted and non-targeted are fairly well balanced. 

```{r}

huh <- comp_data %>% 
  filter(is.na(targeted))

sort(unique(huh$classcode))

```
Aha, these appear to be a a bunch of misc. critters and classification for understory cover. Aslo "young of the year". Probably best to filter these guys out. 

```{r}

comp_data %>% 
  filter(is.na(commonname) == F) %>% 
   group_by(year) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs)) + 
  geom_point()
  
```

```{r}

comp_data %>% 
  filter(is.na(commonname) == F) %>% 
  group_by(year,region) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs, fill = region)) + 
  geom_bar(stat = 'identity')

```
Huh, worth noting that a few of the islands only come in after 2003. What is this doing to your effect that you're basically brining in a bunch of new islands with different effects right after implementation of MPAs in 2003?

```{r}

comp_data %>% 
  filter(is.na(commonname) == F) %>% 
  group_by(year,broadtrophic) %>% 
  summarise(num_obs = length(biomass)) %>% 
  ggplot(aes(year, num_obs, fill = broadtrophic)) + 
  geom_bar(stat = 'identity')

```



### Effect Exploration

Let's look at the change in density as a functino of a variety of variables

```{r}

candidates <- c('biomass','year','site','side','classcode',
                'region','broadtrophic','reserve','campus','mpaareanm2','mean_temp','guild','trophicgroup','targeted',
                'mean_enso','mean_pdo','vbgf.k','vbgf.linf','max_length_fishbase')

reg_data <- comp_data[,candidates] %>% 
  filter(is.na(biomass) == 0, is.na(targeted) == F) %>% 
  mutate(log_density = log(biomass),
         factor_year = as.factor(year),
         targeted = as.numeric(targeted == 'Targeted'),
         post_mlpa = as.numeric(year >= 2003),
         reserve = as.numeric(reserve == 'IN'),
         did = as.factor(targeted * post_mlpa * year)
         )

```

```{r}

reg_data %>% 
  group_by(year,targeted) %>% 
  summarise(mean_density = mean(biomass, na.rm = T)) %>% 
  ggplot(aes(year,mean_density, color = factor(targeted))) + 
  geom_line()

```

```{r}

reg_data %>% 
  group_by(year,targeted,region) %>% 
  summarise(mean_density = mean(log(biomass + 1e-3), na.rm = T)) %>% 
  ggplot(aes(year,mean_density, color = factor(targeted))) + 
  geom_line() + 
  facet_wrap(~region) + 
  ylab('log mean density')

```

One troubling pattern, both ANA and SCI show that steep drop in densities from 2000-2003, which might really be skewing the effect of the MPAs (that early negative effect), though on the plus side it's both targetted and non-targeted 

Let's look at some other factors here

```{r}

reg_data %>% 
  group_by(broadtrophic) %>% 
  summarise(mean_density = mean(biomass, na.rm = T)) %>% 
  ggplot(aes(broadtrophic,mean_density, fill = broadtrophic)) + 
  geom_bar(stat = 'identity')

```
No surprises, higher density of herbivores

```{r}

reg_data %>% 
  group_by(region) %>% 
  ggplot(aes(region,biomass, fill = region)) + 
  geom_boxplot()

```

```{r}
reg_data %>% 
  select(biomass,year,mean_temp, mean_pdo, mean_enso) %>% 
  gather(metric,value, contains('mean')) %>% 
  group_by(year,metric) %>% 
  summarise(mean_biomass = log(mean(biomass, na.rm = T)), mean_value = mean(value, na.rm = T)) %>% 
  ggplot(aes(mean_value,mean_biomass, fill = metric)) + 
  geom_abline(aes(intercept = mean(mean_biomass, na.rm = T), slope = 0)) + 
  geom_point(shape = 21) + 
  facet_wrap(~metric, scales = 'free_x')
```

Interesting, definitely some noise in here, but seems to a a trend towards the "mean" of the environmental covariates. Good evidence for throwing a quadratic in here. 

```{r}

environmental_cor = reg_data %>% 
  filter(is.na(mean_temp) == F) %>% 
  select(mean_temp, mean_enso, mean_pdo, year) %>%
  cor() 

corrplot::corrplot(environmental_cor)

```
Quite a bit of correlation between some of the environmental variables, but you don't really care about estimating those precisely so not a huge deal. 

But, the year thing could be a bit messy, since you have such poor contrast before and after MPA, so might need to look at mean PDO as a problem variable. 

### DiD Structure

Now, what should the actual difference in difference look like?

The key thing you need is treatment on the treated, so that's being a fished species post 2003. 

If you include species specific fixed effects, then you don't have to include the general effect of being a fished species, since those will all come out in the wash (i.e. the fished effect is internalized in the species specific fixed effects, or alternatively, you say that the species level effect is a function of covariates, including being fished)

Similarly with post-mpa. If you don't include year fixed effects, then you need it. But, if you include year fixed effects, then the "post mpa" effect should be soaked up, and in fact perfectly colinear with, the year fixed effects. 

So, now, the real pain in the ass is the year terms. Which I think you figured out! 

The question then is how do you control for other things. 

You could just include the DiD term 

### Bare-bones regression

Let's start with a bare bones regression, using `rstanarm`, one hierarchichal, one not. You'll then compare your results to one where you code the likelihood yourself 

```{r, eval = F}

reg_vars <- c('log_density','factor_year', 'year','targeted', 'did', 'region' ,
'mean_enso','mean_pdo', 'mean_temp','classcode','site','side','post_mlpa','region',
'broadtrophic')

has_all <- function(x) any(is.na(x)) == F

pos_reg_data <- reg_data %>% 
  select(-mean_temp) %>% 
  left_join(conditions_data %>% select(site,side,year, contains('mean')), by = c('site','side','year')) %>% 
  filter(biomass >0, year >=2000) %>% 
  select_(.dots = as.list(reg_vars)) %>% 
  mutate(has_all_vars = apply(.,1,has_all)) %>% 
  filter(has_all_vars == T) %>% 
  map2_df(colnames(.), center_scale, omit_names = c('log_density','year','mean_enso','mean_pdo')) %>%
  mutate(did_dummy = 1 * targeted,
         did_year = paste('did',year, sep = '_')) %>% 
  spread(did_year,did_dummy, fill = 0) %>% 
  mutate(temp2 = mean_temp^2,
         pdo2 = mean_pdo^2,
         enso2 = mean_enso^2,
         site_side = paste(site, side, sep = '_'))

# pos_reg_data$did_2003[pos_reg_data$did_2003 == 1] <-  0.5

pos_reg_data %>% 
  group_by(factor_year,targeted) %>% 
  summarise(mb = mean(log_density, na.rm = T)) %>% 
  ggplot(aes(factor_year, mb, color = factor(targeted))) + 
  geom_point()


did_years <- paste('did',2000:2013, sep = '_')

did_year <- did_years[did_years!='did_2000']


simple_reg <-
as.formula(
paste0(
'log_density ~',
paste(did_year, collapse = '+'),
' + (1|year) + (1 + mean_temp + temp2 |classcode) + mean_enso +  mean_pdo + (1 | site_side) + (1 | site) + (1 | region) + targeted + post_mlpa'
)
)

# + vbgf.linf +
# vbgf.k +
# mean_enso + enso2 + mean_pdo + pdo2 + mean_temp  + temp2')

freq_flat_reg <- lme4::lmer(simple_reg, data = pos_reg_data)

flat_reg <- stan_glmer(simple_reg, data = pos_reg_data,chains = 1)


freq_did_plot <-  freq_flat_reg %>% 
  tidy() %>% 
  mutate(lower = estimate - 1.96 * std.error,
         upper = estimate + 1.96 * std.error) %>% 
  filter(str_detect(term,'did')) %>% 
  mutate(year = str_replace(term, 'did_','') %>% as.numeric()) %>% 
  ggplot() +
geom_hline(aes(yintercept = 0)) + 
  geom_vline(aes(xintercept = 2003), color = 'red', linetype = 2) +
  geom_pointrange(aes(year,estimate, ymax = upper, ymin = lower)) + 
  ylab('Estimated MLPA Effect') + 
  ggrepel::geom_text_repel(data = data_frame(x = 2003, y = 1), aes(x,y, label = 'MLPA Enacted'),nudge_x = 2) + 
  xlab('Year')
         
freq_did_plot

did_plot <- flat_reg %>% 
  as.data.frame() 
  
  did_plot <- did_plot[,str_detect(colnames(did_plot), 'did_')] %>% 
  gather(did,effect) %>% 
  mutate(year = str_replace(did,'did_','') %>% as.numeric()) %>% 
  group_by(year) %>% 
  summarise(mean_effect = mean(effect),
            top = sort(effect)[round(.975 * length(effect))],
            bottom = sort(effect)[round(.025 * length(effect))]) %>% 
  ggplot() + 
  geom_hline(aes(yintercept = 0)) + 
  geom_vline(aes(xintercept = 2003), color = 'red', linetype = 2) +
  geom_pointrange(aes(year,mean_effect, ymax = top, ymin = bottom)) + 
  ylab('Estimated MLPA Effect') + 
  ggrepel::geom_text_repel(data = data_frame(x = 2003, y = 1), aes(x,y, label = 'MLPA Enacted'),nudge_x = 2) + 
  xlab('Year')

did_plot
ggsave('new_mlpa_did.pdf',did_plot)

mtcars %>% 
  map_df(center_scale)

```

Now let's try and replicate with a custom STAN function

```{r, eval = F}

did_plot <- flat_reg %>% 
  as.data.frame() %>% 
  select(contains('classcode')) %>% 
  gather(classcode,effect) %>% 
  mutate(classcode = str_replace(classcode,'classcode',''))

stan_reg_data <- pos_reg_data %>% 
  mutate(year_marker = 1) %>% 
  mutate(year = paste('year',year, sep = '_')) %>% 
  spread(year,year_marker, fill = 0) %>% 
  mutate(classcode = paste('classcode',classcode, sep = '_'),
         class_marker = 1) %>% 
  spread(classcode,class_marker, fill = 0) %>% 
    mutate(site = paste('site',site, sep = '_'),
         site_marker = 1) %>% 
  spread(site,site_marker, fill = 0) %>% 
  select(log_density, contains('year_'), contains('did_'),
         contains('classcode_'), contains('site_')) %>% 
  mutate(constant = 1)

levels_to_drop <- c(min(pos_reg_data$year),sort(unique(pos_reg_data$classcode))[1],
                    sort(unique(pos_reg_data$site))[1])

stan_reg_data <- stan_reg_data %>% 
  select(-year_2000, -did_2003, -site_ANACAPA_ADMIRALS, -classcode_acor)
                    

y <- as.numeric(stan_reg_data$log_density)

x <- stan_reg_data %>% 
  select(-log_density) %>% 
  as.matrix()


stan_fit <- stan(
  file = '../scripts/ahnold_reg.stan',
  data = list(
    num_pars = dim(x)[2],
    num_obs = length(y),
    y = y,
    x = x
  ),
  chains = 4, 
  warmup = 1000,
  iter = 2000,
  cores = 4, 
  refresh = 100
)

did_terms <- which(str_detect(colnames(x),'did'))

did_plot <- stan_fit %>% 
  as.data.frame() %>% 
  select_(.dots = as.list(did_terms)) %>% 
  gather(did,effect) %>% 
  # mutate(year = str_replace(did,'did_','') %>% as.numeric()) %>% 
  group_by(did) %>% 
  summarise(mean_effect = mean(effect),
            top = sort(effect)[round(.975 * length(effect))],
            bottom = sort(effect)[round(.025 * length(effect))]) %>% 
  ggplot() + 
  geom_hline(aes(yintercept = 0)) + 
  # geom_vline(aes(xintercept = 2003), color = 'red', linetype = 2) +
  geom_pointrange(aes(1:length(mean_effect),mean_effect, ymax = top, ymin = bottom)) + 
  ylab('Estimated MLPA Effect') + 
  # ggrepel::geom_text_repel(data = data_frame(x = 2003, y = 1), aes(x,y, label = 'MLPA Enacted'),nudge_x = 2) + 
  xlab('Year')


```

OK! Things work. Moving over to `run_ahnold` for formal construction of this process

### Multi-level (hierarchichal) notes

OK, so I think I'm finally starting to get the hang of this. The confusion, from a regression point of view, is how mechanical do you have to be about the "multilevel part". 

Look at Gelman 12.15 and the code there. 

Your confusion has been, say you've got observations at the transect level, and you want to include species level covariates. In a fixed effects world this wouldn't work: you can't estiamtes species level fixed effects, along with things like vbk. that don't vary within a species. You can either include species fixed effects, or component things that define species. 

The way Gelman talks about this in multilevel modeling, is that you can instead model this in a hierarchical manner, where the coefficients of the species fixed effects are a function of things like vbk. This makes sense, but the question is the mechanics of this. My impression was that I would have to do this all manually, i.e. take the betas of the fixed effects, then write up a regression of those betas as a function of vbk etc. Makes sense but a total pain in the ass. 

Looking at Gelman page 266 though, it becomes clearer. It seems like mechanically you can do this just by converting the fixed effects to clustered random effects, and then including vbk as just another coefficient. 

e.g. `lmer(y ~ x + vbk + (1 | species))`


## Propensity Scores

```{r}

```

Add in broad bioregion effect

That could just be recovery inside and not spillover. Do you have to have spillover

They don't count the 

Look at Steve's BACI response to Ray's 

What happens to the regulaions outside in addition to the reserve. A simple model that looks at what would you expect 

Good data for the mainland 

Look into literature for priors 

I'm going to show that you can disentangle recruitment from lenghts

## 2017-03-15

### Expressing MLPA effect

Might be a good idea to show the significance statistically, but the effect through simulation. Simulate draws from the joint posterior with and without MLPA, show mean densities. Easier to understand, allows you to combine the net effect of hurdle component

## 2017-03-16

### Check in with Jen

* Check in with the idea of more local indicies of ENSO/PDO

* No big ENSO event till 2013

* Might need to explore removing painted greenling since they are bottom dwelling cryptic-ish. Tend to be counted more frequently when there wasn't much else. 

* Drop santa barbara island entirely (one off sample, 4 years)

* Let's think about parsing this by species. Fishing is very different across different species in here, what species are really driving the analysis? Are different species disproportionately driving the results 

* We have before and after data from the overflights, so could at least use that as a prior on the scale of redistribution. SAMSAP might be able to pick up the potential "blue paradox" side of things, as

* What if this is all just a recruitment signal 

* We could look at the SMURF data to get an idea of the recruitment trends 

* Something weird happened in 2008-2010 across all the CI data

* PISCO assumes that all species have the same probability of being seen anywhere in all the islands. Might be worth looking into what happens if you zero fill by data-base wide 

## 2017-21-03

Things are looking pretty strong. Basic Model diagnostics look solid for the most part, though the LMER approach is a bit skewed to the right. But no alarm bells. The biggest issue is really low effective samples sizes for several species, as well as collinearity between a few of the model parameters (load up the STAN run and launch shinystan to dig into this a bit). 

A few things you can do to try and deal with that. Re-run the model with only let's say the top 75% percentile by positive occurance species, to make sure that you're not getting too thrown by a bunch of really rare species. Can also try rerunning without painted greenling, since Jen seems to think those buggers might be a problem. I'm a little hesitant to do too much judgement based pruning of the data. Key things is does it really cause the results to break down. 

Jen also raised the issue of the zero filling. You did your zero filling by site. It seems that when Jen was going from raw lengths to densities, she was zero filling by region. So, a garibalidi should be missing from every transect in San Miguel. This seems a litle extreme to me, but you can play with this once you get the ability to move from lenghts to densities. I really want to hone in on any subjective decisions that might be driving the results, and this seems like a potentially big candidate
. 

Update: removing painted greenling has no real effect on results. Phew. 

Running now with only the top 50th percentile of species incorporated in the analysis, seeing what that does. 

One thought on the random vs. fixed effects thing: should species really be a random effect? Maybe you should only have varying slopes but not intercepts by species, so have fixed effects by species category, and then temperature and region effects by species.

Wooooo species effects don't matter either

You should go through and spell out why you think some things are random and some are fixed effects, drawing from the gelman framework, where the nature of the hierarchichal prior is really the defining characteristic 

## Sketching out Hypothesis Runs

What are the hypothetical states of the MPA world that you want to run over? One option is to specify a handful of boutique model runs: model 1, 2, 3 etc, each designed to evaluate a very specific set of circumstances. The other is to specify a set of key levels, and then monte carlo over combinations of those levers. 

Regardless, the key things seem to be

  1. Strength of density dependence
  2. Nature of density dependence
  3. initial b and f
  4. Larval movement
  5. Adult movement
  6. Age at maturity 
  7. Fleet reaction
  8. Species composition
  9. Scale of MPA relative to movement
  
The first 6 are relatively straightforward. Seems possibly easier to monte carlo and then pull out case study scenarios that you are interested in, since speed really isn't the issue here. 

You could also use that to easily create ensembles of species. 

So, for a given "run" you'd specify how many species you want to generate, and then project them and aggregate. So, you could specify one species if you want ot make it clear, or an assemblage of many different species. 

The risk with this approach is that you might just get a gigantic mess of results. But, you can easily pare this approach down to a concrete set if you want. I think this gets away from "cherry picking" scenarios that fit your story. You can present a giant array in trelliscope of outcomes, and then choose some to present in the paper. 


  
  1. Strength of density dependence
    * this is just steepness
  2. Nature of density dependence
    * You can use the babcock definitions that are already in there
  3. f
    * One option is just to say pick a random F
    * The other option is some kind of effort dynamics model, from open access to one way trip etc.
    * Seems excessive, the key thing is it overfished or not, and is it getting worse or not
    * Pick a random f, run it out
    * Stop the thing at some random point.... problem there is you can't be in "recovery"
  4. Larval movement
    * harder. Will take some thought to do this one right really. Key thing is do you want to deal with larvae or with recruits really. See how you do it in GASP  
  5. Adult movement
    * Easy enough, what proportion of adults move from each cell
  6. Age at maturity 
    * simple enough
  7. Fleet reaction
    * Dilute
    * Concentrate
  8. Species composition
  9. Scale of MPA relative to movement
    * Make the MPAs about the pattern of CI

For now, let's take the life history straight out of the data that are actually used. So, grab the species that make it through the filter and project those. So, each run will take the life history data and run with it with a random assortment of traits. 

Let's focus on keeping it simple stupid. Take the species, project forward with a random F and a random stop point, a steepness drawn from the family distribution for that species, a random density denepence form... let's spell it out


1. Draw a species from the database
2. Fill in missing non-variale life history characteristics (e.g. linf)
3. Randomly assign characteristics of interest (density dependence form, movement, etc.)
4. Create completed fish object
5. Create a fleet object
6. Fill in the fleet object with things like F, or fleet model dynamics etc. 
7. Pass a completed fish and fleet object to `sim_fishery`
8. Store results
9. Repeat runs a whole bunch of times. 
10. Aggregate as desired

# 4/13/17

As long as the errors in your dependent variable are mean 0, then there's no problem. If they are not, or if the errors are a function of your dependent variable, then you do have  problem

Look in to stratification from that book of Kyles



## Check in with COdy

Make mortality/growth rates a function of biomass

Check on lit on this 

Make plot of length on x and year on y, so you get a bunch of stacked shifts in cohorts

Maybe add in a ricker function, though might need to double check on the steepness values. You might be moving things over the to right hand side of 

# Examine OST catch data


```{r}



ost_region_catch <- read_csv('../data/SouthCoastHumanActivitiesCommercialFisheriesEconomicandSpatialData1992to2012/sc_comm_fisheries_region_landings_data_1992_2012.csv') %>% 
  set_names(colnames(.) %>% tolower())

ost_region_catch %>% 
  mutate(pounds = str_replace(pounds,',','') %>% as.numeric(),
         revenue = str_replace_all(revenue,"\\$|,",''),
         price = str_replace(`average price`,'\\$','')) %>% 
  group_by(year,fishery) %>% 
  summarise(catch_lbs = sum(pounds, na.rm = T)) %>% 
  ggplot(aes(year,catch_lbs, color = fishery)) + 
  geom_vline(aes(xintercept = 2003), linetype = 2, color = 'red') +
  geom_line()

```

Interesting, stablish throughout the SC region, though that's a pretty damn big area

```{r}

ost_port_catch <- read_csv('../data/SouthCoastHumanActivitiesCommercialFisheriesEconomicandSpatialData1992to2012/sc_comm_fisheries_port_landings_data_1992_2012.csv') %>% 
  set_names(colnames(.) %>% tolower())

ost_port_catch %>% 
  filter(port %in% c("Port Hueneme/Oxnard", "Santa Barbara","Ventura")) %>% 
  mutate(pounds = str_replace(pounds,',','') %>% as.numeric(),
         revenue = str_replace_all(revenue,"\\$|,",''),
         price = str_replace(`average price`,'\\$','')) %>% 
  group_by(year,fishery) %>% 
  summarise(catch_lbs = sum(pounds, na.rm = T)) %>% 
  ggplot(aes(year,catch_lbs, color = fishery)) + 
  geom_vline(aes(xintercept = 2003), linetype = 2, color = 'red') +
  geom_line(show.legend = F) + 
  facet_wrap(~fishery, scales = 'free_y') + 
  theme(axis.text.y =  element_blank(),
        axis.text.x = element_blank())


```


# 4/14/17

I'm honestly close to out of ideas. These catch dat are pretty worring to be honest. If you buy these data as being representative of the fished species out in the channel islands, then this suggests a decreasin gcatch trend in nearshore finfish catches, not stable or increasing. That really makes it hard to explain the MPA mediated "decrease" in abundance. 

Possible explanations that remain. 

1. Some form of density dependent mortality (or ricker like dynamics)
  * Can model this

2. Catches at the islands themselves have been stable or gone up
  * Can try and get some more regional data to check on this
  
3. Ocam's razor: There was just a dramatic enough shift in sampling regimes when the MPAs went in place that pre-and-post are just not comparable in any way shape of form

4. Let's write up the current results and state of the world this weekend/next week and send that to committee for comments

# Scraping CDFW Data


```{r}

library(tidyverse)
library(lubridate)
library(stringr)
library(tabulizer)
library(tabulizerjars)

a = tabulizer::extract_tables("../data/cdfw-data/landings00_table12.pdf")

x <- a[[1]]

x <- as_tibble(x)

process_cdfw <- function(x){
  
  x <- as_tibble(x)
  
  x <- slice(x, -(1:2))

    numfoo <- function(z){
    
    z <- str_replace_all(z, '\\.|\\,','')
    
    if (any(is.na(as.numeric(z)) == F)){
      
      z = as.numeric(z)
    } else{
      z = z
    }
    
    }
    
    browser()
    x <- map_df(x, ~numfoo)
  
}

 b =  map_df()

map(a, process_cdfw)

```


